[PDF] Asymptotic Analysis for Data-Driven Inventory Policies

Abstract

We study periodic review stochastic inventory control in the data-driven setting, in which the retailer makes ordering decisions based only on historical demand observations without any knowledge of the probability distribution of the demand. Since an (s,S) -policy is optimal when the demand distribution is known, we investigate the statistical properties of the data-driven (s,S) -policy obtained by recursively computing the empirical cost-to-go functions. This policy is inherently challenging to analyze because the recursion induces propagation of the estimation error backwards in time. In this work, we establish the asymptotic properties of this data-driven policy by fully accounting for the error propagation. First, we rigorously show the consistency of the estimated parameters by filling in some gaps (due to unaccounted error propagation) in the existing studies. On the other hand, empirical process theory cannot be directly applied to show asymptotic normality. To explain, the empirical cost-to-go functions for the estimated parameters are not i.i.d. sums, again due to the error propagation. Our main methodological innovation comes from an asymptotic representation for multi-sample U -processes in terms of i.i.d. sums. This representation enables us to apply empirical process theory to derive the influence functions of the estimated parameters and establish joint asymptotic normality. Based on these results, we also propose an entirely data-driven estimator of the optimal expected cost and we derive its asymptotic distribution. We demonstrate some useful applications of our asymptotic results, including sample size determination, as well as interval estimation and hypothesis testing on vital parameters of the inventory problem. The results from our numerical simulations conform to our theoretical analysis.

Full PDF

AAsymptotic Analysis for Data-Driven InventoryPolicies

Xun Zhang, Zhisheng, Ye

Department of Industrial Systems Engineering and Management, National University of Singapore, 21 Lower Kent Ridge Rd,Singapore 119077 , [email protected], [email protected]

William B. Haskell

Krannert School of Management, Purdue University, West Lafayette, IN 47907, [email protected]

We study periodic review stochastic inventory control in the data-driven setting, in which the retailermakes ordering decisions based only on historical demand observations without any knowledge of the prob-ability distribution of the demand. Since an ( s, S )-policy is optimal when the demand distribution is known,we investigate the statistical properties of the data-driven ( s, S )-policy obtained by recursively computingthe empirical cost-to-go functions. This policy is inherently challenging to analyze because the recursioninduces propagation of the estimation error backwards in time. In this work, we establish the asymptoticproperties of this data-driven policy by fully accounting for the error propagation. First, we rigorously showthe consistency of the estimated parameters by ﬁlling in some gaps (due to unaccounted error propagation)in the existing studies. On the other hand, empirical process theory cannot be directly applied to showasymptotic normality. To explain, the empirical cost-to-go functions for the estimated parameters are noti.i.d. sums, again due to the error propagation. Our main methodological innovation comes from an asymp-totic representation for multi-sample U -processes in terms of i.i.d. sums. This representation enables us toapply empirical process theory to derive the inﬂuence functions of the estimated parameters and establishjoint asymptotic normality. Based on these results, we also propose an entirely data-driven estimator of theoptimal expected cost and we derive its asymptotic distribution. We demonstrate some useful applicationsof our asymptotic results, including sample size determination, as well as interval estimation and hypothesistesting on vital parameters of the inventory problem. The results from our numerical simulations conformto our theoretical analysis. Key words : inventory management; nonparametric estimation; empirical process; U -process; History :

1. Introduction

Periodic review stochastic inventory control is fundamental in operations management. In thisproblem, the retailer must review the inventory levels and then make ordering decisions. A goodordering policy can signiﬁcantly reduce operating costs and can also improve overall business a r X i v : . [ m a t h . S T ] A ug un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript performance. In our present paper, we are concerned with solving this problem in the data-drivensetting where only historical demand observations are available.

Periodic review stochastic inventory control has received major attention over the last few decades(Scarf 1960, Braden and Freimer 1991, Ding et al. 2002, Huh et al. 2009b, 2011, Levi et al. 2015, Ban2020). Scarf (1960) demonstrated the optimality of an ( s, S )-type policy. This is a classical policywhere the retailer orders up to level S whenever the current inventory level falls below a reorderpoint s (both s and S must be determined to implement this policy). Since then, many studiesderive the form of the optimal inventory policy under diﬀerent problem settings (e.g., Markoviandemand (Sethi and Cheng 1997); lost sales and nonzero lead time (Huh et al. 2009b, Goldberg et al.2016), etc.). For detailed analysis of the optimality of an ( s, S )-type policy in diﬀerent settings, werefer to the comprehensive review in Snyder and Shen (2019).All of the previously mentioned works assume that the retailer knows the underlying demanddistribution. Yet, in practice, we usually only have historical demand observations. We need to beable to make inventory ordering decisions in a data-driven way.The Bayesian approach oﬀers one such way to make data-driven ordering decisions (Bradenand Freimer 1991, Lariviere and Porteus 1999, Ding et al. 2002, Chen 2010, Bisi et al. 2011). Inthe Bayesian approach, we start with a prior distribution over the unknown parameters of thedemand distribution. Then, we can dynamically update this distribution with each new demandrealization. However, this approach suﬀers from the “curse of dimensionality” (Chen and Mersereau2015). Furthermore, one needs the demand distribution to have a parametric form in order for thismethod to be practical (Braden and Freimer 1991). In particular, Bisi et al. (2011) showed thatthis procedure is only scalable for the Weibull distribution. We refer to Chen and Mersereau (2015)for a recent survey of Bayesian methods in stochastic inventory control.Non-parametric methods (which make no assumptions on the underlying demand distribution)have been developed for both the oﬄine and online settings. In the oﬄine setting, all the demanddata are collected a priori. Huh et al. (2011) used the Kaplan-Meier estimator to create a data-driven policy with censored demand, and they prove almost sure convergence for discrete demanddistributions (but without a convergence rate). Levi et al. (2007, 2015) used sample average approx-imation (SAA) to solve a “shadow” dynamic programming problem based on the demand data, andthen used it to produce an implementable policy. Levi et al. (2007, 2015) used the Hoeﬀding andBernstein inequalities to bound the convergence rate on the optimality gap of the estimated policy.Ban (2020) also uses SAA to analyze a data-driven ordering policy for periodic review stochas-tic inventory control with right-censored demand. She used the M -estimator and the Z -estimator un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript theory (which is based on empirical process theory) to study the consistency and asymptotic dis-tribution of their data-driven policy, and then constructed conﬁdence intervals for the optimalpolicy.In the online setting, demand data are observed on the ﬂy. Online non-parametric methodsare evaluated in terms of their “regret”, which is the diﬀerence between the expected cost of anadaptive policy and the true optimal expected cost, see e.g. Besbes and Muharremoglu (2013).Burnetas and Smith (2000) developed a stochastic gradient descent type algorithm to computethe optimal policy in a perishable inventory system. Godfrey and Powell (2001) and Powell et al.(2004) developed “concave adaptive value estimation” (CAVE) to successively approximate thenewsvendor cost function with a sequence of piece-wise linear functions. Huh and Rusmevichientong(2009) developed another stochastic gradient descent type algorithm for lost-sales systems withcensored demand. They prove a sublinear (square-root) convergence rate for the long-run regret.Huh et al. (2009a) studied a non-parametric adaptive algorithm for ﬁnding the optimal base-stockpolicy in lost-sales inventory systems with positive replenishment lead times, and here they provea sublinear (cube-root) convergence rate. We consider the setting of Scarf (1960), where there is a retailer who makes periodic decisions oninventory levels based only on the demand of the past n selling seasons, all with the same timehorizon T . All unmet demand is backlogged. We are interested in the data-driven policy basedon SAA as in Levi et al. (2007), Levi et al. (2015), and Ban (2020). We want to quantify ouruncertainty about how close the data-driven policy is to the true optimal policy. With this aim, wewant to demonstrate consistency and compute the asymptotic distribution of this estimator. Theseare both open problem in the literature. The key challenge, as discussed in Levi et al. (2007), is thatthe estimated solution for time t depends heavily on the solutions for future time periods (whichare also estimated) and thus there is backwards error propagation. The Hoeﬀding and Bernsteininequalities are used in Levi et al. (2007, 2015) to provide high probability bounds on how closethe data-driven policy is to the true optimal policy. However, these concentration inequalities donot help to compute the asymptotic distribution.Ban (2020) is the most closely related work that uses statistical tools to analyze the data-drivenpolicy. Her analysis implicitly imposes the strong assumption that the number of observations ineach time period is inﬁnitesimal compared to the number of observations in future time periods.This is equivalent to saying that we compute the data-driven policy for time period t as if thecomputations for all future time periods after t are done exactly. Under this assumption, thestatistical error in the estimated cost-to-go functions and the estimated ( s τ , S τ ) parameters in un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript future periods τ > t is inﬁnitesimal compared to the estimation error for the current period t . Inthis way, the statistical error from future time periods can just be ignored when we estimate ( s t , S t )for time period t , and consistency and normality of the estimated ( s t , S t ) can be established bystandard arguments. Speciﬁcally, asymptotic normality of ( s t , S t ) follows from a direct applicationof Theorems 5.21 and 5.23 in van der Vaart (1998), whose conditions are met only if the futurestatistical error is negligible.Practically speaking, however, the number of observations in all periods should be roughly thesame in most applications. In particular, the statistical error from future time periods has thesame order of magnitude as the estimation error for the current period. The propagation of thefuture estimation error makes standard arguments for the asymptotic analysis of the estimatedparameters inapplicable, and a new theoretical paradigm is needed to give a full treatment of theerror propagation. In our upcoming development, this issue is captured by the diﬀerence betweenthe function ˆ M t,n (that accounts for current and future estimation error, see (2.8)) and the function M t,n (that only accounts for current estimation error and assumes all future expectations arecomputed exactly, see (4.1)). Our upcoming Lemma 2 and Lemma 3 decompose the eﬀect of futureestimation error, and they match the analysis in Ban (2020) when the error propagation terms arezero. However, because the number of observations is expected to be roughly the same for all timeperiods, further eﬀort needs to be made to quantify the error propagation when it is not negligible.Overlooking the error propragation will lead to underestimates of the asymptotic variance of thedata-driven policy, as revealed in our simulations in Section 8. We highlight the major contributions of our present manuscript as follows:1. We complete the analysis of the data-driven ordering policy and show that it is consistentand asymptotically normal. In this work, we assume that the number of demand observationsis constant over time (in contrast to Ban (2020)), and we show how to fully account for thebackwards propagation of future estimation error in this setting. In order to analyze the asymptoticdistribution of the data-driven policy, we decompose the total estimation error into two parts withthe help of empirical process techniques. The ﬁrst part can be understood as the estimation error,if we know the true future demand distribution. This part can be easily treated with standardarguments. The second part measures the accuracy of the estimates in future time periods – whichis more nuanced and is not considered in the existing literature. We analyze this part by convertingit into a multi-sample U -process.2. The general theory of multi-sample U -processes is underdeveloped. Neumeyer (2004) hasstudied weak convergence of two-sample U -processes by using an exponential inequality developed un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript in Nolan and Pollard (1988) to control the conditional tail probabilities of a symmetrized empiricalprocess. However, as pointed out in Nolan and Pollard (1988), this technique does not extend tokernels with more than two arguments. To the best of our knowledge, there are no general U -processresults available for more than two samples. We develop a multi-sample U -process theory, i.e.,Lemma EC.5, for the kernel functions arising from our problem. Thanks to the special structure ofthe kernel functions, we are able to show that each U -process in our problem can be decomposedinto an i.i.d. sum of their H´ajek projections plus a uniformly negligible remainder. The resultingi.i.d. sum can then be handled by standard empirical process theory.3. As a consequence of asymptotic normality, we derive conﬁdence bands for the estimates of theparameters of the optimal policy. This analysis quantiﬁes the uncertainty in the estimated optimalpolicy by providing a conﬁdence band around all parameters simultaneously (called a “simultaneousconﬁdence band”). This band gives retailers a better sense of the volatility (or the trustworthiness)of the current data-driven policy. Conversely, we can also calculate the minimum amount of dataneeded to construct conﬁdence bands of a desired size. These conﬁdence intervals depend on correctcomputation of the asymptotic variance of our estimator, which can be decomposed into two parts.The ﬁrst part comes from the stochasticity of the current period demand, and the second comesfrom the stochasticity of future demand. The existing literature accounts for the ﬁrst part of thisvariance, and we complete the analysis by accounting for the second part.4. Our analysis gives general data-driven estimators for vital problem parameters, beyond justthe order-threshold and order-up-to levels. We develop two examples in detail. For the ﬁrst example,we show our estimator for the optimal expected cost is asymptotically normal, and we give aconsistent estimate of its asymptotic variance. Previous works in Levi et al. (2007, 2015) andBan (2020) plug the estimated optimal policy into the formula for expected cost and then boundthe diﬀerence with respect to the true optimal expected cost. However, the plug-in expected costinvolves expectation with respect to the unknown demand distribution, and thus is itself unknown.In contrast, our point estimator and conﬁdence intervals for the optimal expected cost can bedirectly computed from the data, the retailer can construct conﬁdence intervals for the optimalexpected cost with our result. This lets the retailer determine if the system is even proﬁtable withhigh enough conﬁdence, and it also guides the choice of sample size. For the second example, wedo hypothesis testing on the monotonicity of the order-up-to levels, i.e., S ≤ S . When this nullhypothesis is not rejected, then a greedy policy is optimal which is much easier to compute.This paper is organized as follows. Section 2 introduces the problem setting and the data-drivenpolicy. Then, we review the statistical preliminaries in Section 3. Our main results on consis-tency are in Section 4. In Section 5, we discuss the guiding idea of our analysis and demonstrateasymptotic normality for the one period problem. In Section 6, we build on the previous idea to un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript demonstrate asymptotic normality for the two period problem. The following Section 7 demon-strates asymptotic normality for the general case, based on induction arguments matching Section6. We report numerical experiments in Section 8 that support our theoretical analysis, and weconclude the paper in Section 9. The details for all proofs are gathered together in the ElectronicCompanion (and all references to the Electronic Companion are given the preﬁx “EC”).

2. The Data-Driven Inventory Problem

In our setting (the same as Scarf (1960)), the retailer chooses inventory levels for an upcomingselling season with a ﬁnite planning horizon T (cid:44) { , , · · · , T } (we let T (cid:44) { , , · · · , T − } denotethe set of all time periods except the terminal one). The demand is given by a random vector( D , · · · , D T ) distributed on the measurable space ( R T , B ) according to a probability measure P .Consider time period t ∈ T and let x t denote the starting inventory at the beginning of period t .If an order is placed, then a ﬁxed setup cost of K t ≥ y t be the inventory levelafter delivery (the order lead time is negligible), so y t = x t if no order is placed. In period t , if therandom demand D t satisﬁes D t > y t , then there is unmet demand which is backlogged. Otherwise,if D t ≤ y t , then excess inventory is held in storage. The unit backlogging and holding costs are b t and h t , respectively. The one-step inventory cost function is C t ( y t , D t ) (cid:44) b t ( D t − y t ) + + h t ( y t − D t ) + , (2.1)where x + = max { x, } . Subsequently, the expected cost in period t is K t I ( y t > x t ) + c t ( y − x t ) + E C t ( y t , D t ). Actually, the analysis for the case where c t > c t = 0(see e.g. (Huh et al. 2009a, Qin et al. 2019)). So, WLOG we take c t = 0 for all t ∈ T . At the end ofperiod t , the remaining stock (which may be negative) is carried over to the next period and thestarting inventory level for period t + 1 is x t +1 = y t − D t . Any remaining inventory at the end ofthe selling season has zero salvage value.The corresponding ﬁnite horizon inventory control problem is:min π ∈ Π E π T (cid:88) t =1 { K t I ( y t > x t ) + C t ( y t , D t ) } , (2.2)where Π is the set of Markov inventory policies and E π denotes expectation with respect to π ∈ Π.Let V t ( x ) denote the period t ∈ T cost-to-go from the current inventory level x . For all t ∈ T wedeﬁne M t ( y ) (cid:44) E [ C t ( y, D t ) + V t +1 ( y − D t )] , un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript to represent the cost-to-go from order-up-to level y ∈ R . Then the Bellman equations for problem(2.2) are: V T +1 ( x ) = 0 , ∀ x ∈ R ,V t ( x ) = min y ≥ x { K t I ( y > x ) + M t ( y ) } , ∀ x ∈ R . (2.3)In an ( s, S ) − policy, in period t ∈ T the retailer orders up to the order-up-to level S t whenever thestarting inventory level is below the recorder point s t . We already know that an ( s, S ) − policy isoptimal for problem (2.2) and that this policy is Markov.Scarf (1960) gives the following computational scheme to compute ( s t , S t ) for all t ∈ T to obtainthe optimal policy. The optimal order-up-to level S t is determined by solving S t ∈ arg min y ∈ R M t ( y ) . (2.4)The optimal reorder point s t is then determined by solving s t = min { s : M t ( s ) − M t ( S t ) − K t = 0 , s ≤ S t } . (2.5)If the demand distribution P were known, then the optimal policy could be computed by backwardinduction using (2.4) and (2.5). The dynamic programming equations for an ( s, S ) − policy are: V T +1 ( x ) = 0 , ∀ x ∈ R ,V t +1 ( x ) = (cid:40) M t +1 ( S t +1 ) + K t +1 if x < s t +1 , M t +1 ( x ) if x ≥ s t +1 , , ∀ x ∈ R , t ∈ T . (2.6)By combining (2.5) and (2.6), we can write the functions { M t } t ∈T recursively as M T ( y ) = E C T ( y, D T ), and M t ( y ) = E C t ( y, D t ) + M t +1 ( s t +1 ) EI ( y − D t < s t +1 ) + EM t +1 ( y − D t ) I ( y − D t ≥ s t +1 ) , (2.7)for all t ∈ T .We now introduce our main assumptions which are in force for the remainder of the paper. Assumption 1.

For all t ∈ T , the random demand D t is a bounded continuous random variablewith support on [ d t , d t ] with CDF F t ( x ) and PDF f t ( x ) . The demand in diﬀerent periods is inde-pendent, but not necessarily identically distributed. Assumption 1 is standard in the stochastic inventory literature (Huh and Rusmevichientong2009, Huh et al. 2011, Besbes and Muharremoglu 2013).

Assumption 2.

For each t ∈ T , S t is the unique solution of (2.9) and s t is the unique solution of (2.5) . In addition, there is a compact set Θ ⊂ R such that s t , S t ∈ Θ for all t ∈ T . un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript

Assumption 2 corresponds to the “identiﬁability” of the parameters ( s t , S t ). We require that S t and s t can be well separated from other points in Θ (informally, they are each “unique” solu-tions of their corresponding optimization problems). This assumption is typical in the Z -estimatorliterature van der Vaart (1998), and the same requirement is proposed in Ban (2020). We are interested in the case where the demand distribution is unknown and we only have accessto demand data from past selling seasons. In this subsection, we describe a data-driven inventorypolicy based on sample average approximation (SAA) (Levi et al. 2007).Our demand data { ( d i , d i , · · · , d iT ) } ni =1 come from the past n selling seasons. The vector( d i , d i , · · · , d iT ) corresponds to the data from season i = 1 , · · · , n , and we regard these vectors as n i.i.d. realizations of ( D , · · · , D T ). We assume no right-censoring on the observed d it (since allunmet demand is backlogged in our setting). In this setting, we compute estimated parameters { ( (cid:98) s t , (cid:98) S t ) } t ∈T from the data.First we characterize the estimated (a.k.a. “empirical”) cost-to-go as a function of the data.Let ˆ M t,n ( y ) (cid:44) n (cid:80) ni =1 ˆ m t,y ( d it ) denote the empirical cost-to-go from order-up-to level y ∈ R whereˆ m T,y ( d ) (cid:44) b T ( d − y ) + + h T ( y − d ) + andˆ m t,y ( d ) (cid:44) b t ( d − y ) + + h t ( y − d ) + +ˆ M t +1 ,n ( (cid:98) s t +1 ) I ( y − d < (cid:98) s t +1 ) + ˆ M t +1 ,n ( y − d ) I ( y − d ≥ (cid:98) s t +1 ) , (2.8)for t ∈ T . Then, we may compute the empirical order-up-to levels and reorder points by ﬁrst solving (cid:98) S t (cid:44) argmin y ∈ Θ ˆ M t,n ( y ) , (2.9)and then solving (cid:98) s t (cid:44) min (cid:110) s : ˆ M t,n ( s ) − ˆ M t,n ( (cid:98) S t ) − K = 0 , s ≤ (cid:98) S t (cid:111) . (2.10)This recursive construction of ( (cid:98) s t , (cid:98) S t ) is the same as in the classical case, except that it replacesall exact expectations with empirical estimates. The policy produced by (2.9) - (2.10) is purelydata-driven.Eqs. (2.9) - (2.10) give point estimates ( (cid:98) s t , (cid:98) S t ) for the optimal parameters ( s t , S t ). It is importantto quantify the uncertainty in these estimators. A large variance in these estimators implies thatthe estimated ( (cid:98) s t , (cid:98) S t ) are not trustworthy and might be far from the true ( s t , S t ). This uncertaintycan be quantiﬁed using conﬁdence intervals/regions, and linked to the sample size n to allow betterdesign of the data collection scheme. The main objective of our present study is exactly to quantifythis uncertainty through asymptotic analysis. We will establish consistency and asymptotic jointnormality of the data-driven policy represented by the estimated parameters { ( (cid:98) s t , (cid:98) S t ) } t ∈T . un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript There are several challenges to analyzing the estimated parameters { ( (cid:98) s t , (cid:98) S t ) } t ∈T , in terms of: (i)showing that they are consistent estimators of the true parameters { ( s t , S t ) } t ∈T ; and (ii) showingthat they are asymptotically normal. To elaborate, we note that the expression ˆ M t,n ( y ) for period t determines (cid:98) S t , and it is a function of the data from period t . This function is nonsmooth due tothe cost function C t ( y, d ). Furthermore, the full expression of ˆ M t,n ( y ) involves the terms ˆ M t +1 ,n ( y )and (cid:98) s t +1 , which are themselves functions of the data from the future period t + 1. These termscorrespond to two sources of error: (i) approximation error between M t +1 ( y ) and ˆ M t +1 ,n ( y ); and(ii) estimation error in (cid:98) s t +1 . The eﬀects from these two sources of error propagate one period backin time to the random criterion function y (cid:55)→ ˆ M t,n ( y ) and contribute to the random error in theestimation of (cid:98) S t and (cid:98) s t . This phenomenon makes the analysis for (cid:98) S t technically challenging, asexplained below. There are similar challenges to the analysis of (cid:98) s t .First, traditional estimation methods rely on Taylor’s expansion (which requires smoothness) ofthe random criterion functions that deﬁne the estimators (cid:98) s t and (cid:98) S t , at the true parameter values s t and S t . Such expansions fail here because the random criterion functions are nonsmooth (due to C t ( y, d )). We will meet this challenge using modern empirical process theory. This will enable us toperform Taylor’s expansion on the expectation, which is typically smooth, of the random criterionfunction.The second challenge is due to the backwards error propagation. In order to account for theapproximation error due to using ˆ M t,n ( y ) in place of M t ( y ), the i.i.d. sum structure is destroyedand the random criterion function is no longer an empirical process. Instead, it is a two-sample U -statistic in the two-period problem, and it is T -sample U -statistic in the general T -period problem.In order to control the random error of (cid:98) S t and (cid:98) s t involved in these U -statistics, we have to workwith multi-sample U -processes whose theory is underdeveloped. In our analysis, we will makeuse of the special structure of the kernel functions in this problem to get a tractable asymptoticrepresentation of our multi-sample U -processes. The main tool is a uniform approximation resultpresented in Lemma EC.5.

3. Technical Preliminaries

In this section, we describe the key statistical concepts that appear throughout this work. Ingeneral, we are estimating a parameter θ ∈ R p for a random variable on a set X . The (true)parameter value is typically a zero-solution, or the maximizer of a ﬁxed criterion function θ (cid:55)→ M ( θ ),where M ( θ ) = E m θ ( X ) and m θ are known functions on X indexed by θ . An estimator ˆ θ n of θ based on the i.i.d. random sample X , · · · , X n in X is often a zero-solution, or the maximizer ofa random criterion function θ (cid:55)→ M n ( θ ), where M n ( θ ) = n (cid:80) ni =1 m θ ( X i ). The inﬂuence function ofthe estimator ˆ θ n is a key component of our analysis. un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript

Definition 1.

An estimator ˆ θ n of θ is asymptotically linear if there exists a function ϕ : X (cid:55)→ R p such that √ n (ˆ θ n − θ ) = n − / (cid:80) ni =1 ϕ ( X i ) + o P (1), and ϕ is called the (asymptotic) inﬂuencefunction of ˆ θ n .Using empirical process notation, the empirical measure corresponding to the i.i.d. observedsample X , · · · , X n is P n = n (cid:80) ni =1 δ X i , where δ x is the Dirac measure at x . Let h : X (cid:55)→ R be ameasurable function, the true mean is written as P h and the empirical mean is written as P n h = n (cid:80) ni =1 h ( X i ). The empirical process G n h = √ n ( P n h − P h is the centered and scaled version of theempirical measure. Our analysis often appeals to statistical properties of classes of functions, andthe behavior of the corresponding empirical means and empirical processes.

Definition 2. (van der Vaart 1998) A class H of measurable functions h : X (cid:55)→ R is called P -Glivenko-Cantelli if sup h ∈ F | P n h − P h | converges to zero in outer probability. It is called P -Donskerif { G n h : h ∈ F } converges in distribution to a tight Gaussian process in l ∞ ( H ), the space of boundedfunctionals on H . A measurable function H : X (cid:55)→ R is called the envelope of H if | h ( x ) ≤ H ( x ) | forevery h ∈ H and x ∈ X .Next we recall the deﬁnition of a Euclidean class of functions. For a representative example, a VCclass of functions is Euclidean. A Euclidean class is also a P -Donsker class, and we will appeal tothis class of functions when showing uniform convergence. Definition 3. (Nolan and Pollard 1987) Let H be the envelope of H , and let N ( (cid:15), Q, H , H ) bethe (cid:15) (cid:82) HdQ -covering number for H under the L ( Q ) distance, i.e., the minimal number of L ( Q )-balls of radius (cid:15) (cid:82) HdQ needed to cover H . The class H is Euclidean relative to the envelope H ifthere exist constants A and V such that N ( (cid:15), Q, H , H ) ≤ A(cid:15) − V for all (cid:15) ∈ (0 ,

1] and all probabilitymeasures Q satisfying 0 < (cid:82) H dQ < ∞ .As we have mentioned, we must speciﬁcally account for the backwards error propagation fromestimation in future periods. We will do this using U -processes, deﬁned next. Definition 4. (Neumeyer 2004) Let X , X , · · · , X n be i.i.d. random variables with a distribution P on a set X and let Y , Y , · · · , Y m be i.i.d. random variables with a distribution Q on a set Y . Let H be a class of measurable functions h : X × Y → R on the product space.(i) For each kernel h ∈ H , the corresponding two-sample U -statistic is deﬁned by S n ( h ) (cid:44) mn n (cid:88) i =1 m (cid:88) j =1 h ( X i , Y j ) . (ii) Let h (1) ( x ) (cid:44) E h ( x, Y ) − E h ( X , Y ) and h (2) ( y ) (cid:44) E h ( X , y ) − E h ( X , Y ). The H´ajek projec-tion of S n ( h ) − E h ( X , Y ) is S H n ( h ) (cid:44) n n (cid:88) i =1 h (1) ( X i ) + 1 m m (cid:88) j =1 h (2) ( Y j ) . un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript (iii) { S n ( h ) : h ∈ H } is a two-sample U -process indexed by the class H of kernels.A general multi-sample U -process is similarly deﬁned. We will see shortly that the random criterionfunctions for our upcoming estimation problems are exactly U -processes.

4. Consistency of the Data-Driven Policy

We begin by demonstrating consistency of the estimated parameters { ( (cid:98) s t , (cid:98) S t ) } t ∈T using the M - and Z -estimation framework. These tools are also used in Ban (2020) but our analysis is fundamentallydiﬀerent because we account for: (i) propagation of the estimation error of (cid:98) s t +1 ; and (ii) theapproximation error that comes from using ˆ M t +1 ,n ( y ) instead of M t +1 ( y ) in the random criterionfunction y (cid:55)→ ˆ M t,n ( y ). We establish consistency of ( (cid:98) s T , (cid:98) S T ) ﬁrst, and then show consistency of( (cid:98) s t , (cid:98) S t ) for all t ∈ T inductively.Recall that (cid:98) S t is the minimizer of ˆ M t,n ( y ), and (cid:98) s t is the (smallest) solution of ˆ M t,n ( s ) − ˆ M t,n ( (cid:98) S t ) − K t = 0. According to Theorems 5.7 and 5.9 of van der Vaart (1998), the estimators ( (cid:98) s t , (cid:98) S t ) areconsistent if the following two conditions hold:1. The true S t is a well-separated maximizer of M t and s t is a well-separated root of (2.5).2. ˆ M t,n ( y ) converges to M t ( y ) uniformly in y ∈ Θ (i.e., sup y ∈ Θ | ˆ M t,n ( y ) − M t ( y ) | → P M t,n ( y ) (cid:44) n (cid:80) ni =1 m t,y ( d it ) where m T,y ( d ) (cid:44) b T ( d − y ) + + h T ( y − d ) + and m t,y ( d ) (cid:44) b t ( d − y ) + + h t ( y − d ) + + M t +1 ( s t +1 ) I ( y − d < s t +1 ) + M t +1 ( y − d ) I ( y − d ≥ s t +1 ) (4.1)for t ∈ T . The expressions m t,y and M t,n ( y ) are similar to ˆ m t,y and ˆ M t,n ( y ). The diﬀerence isthat any expression that depends on data in future time periods is replaced by the correspondingexact value (i.e., the exact reorder point and the exact expectation). We can then use the triangleinequality to upper bound | ˆ M t,n ( y ) − M t ( y ) | by | ˆ M t,n ( y ) − M t ( y ) | ≤ | M t,n ( y ) − M t ( y ) | + | ˆ M t,n ( y ) − M t,n ( y ) | . It remains to show that both terms on the RHS are o P (1), uniformly in y ∈ Θ. For the ﬁrst term,note that M t,n ( y ) and M t ( y ) are the empirical average and true mean of m t,y , respectively. For t ∈ T , deﬁne the function class M t (cid:44) { d (cid:55)→ m t,y ( d ) : y ∈ Θ } indexed by y . By Lemma EC.1, M t are P -Donsker and are thus also P -Glivenko-Cantelli. The uniform convergence of M t,n ( y ) to M t ( y )then follows from this class property.For the second term, the proof of the claim that sup y | ˆ M t,n ( y ) − M t,n ( y ) | = o P (1) uses backwardinduction based on the induction hypothesis that sup y | ˆ M τ,n ( y ) − M τ,n ( y ) | = o P (1) and ( (cid:98) s τ , (cid:98) S τ ) → ( s τ , S τ ) for τ = T, T − , · · · , t + 1. We combine the statements of the uniform convergence ofˆ M t,n ( y ) − M t,n ( y ) and the consistency of { ( (cid:98) s t , (cid:98) S t ) } t ∈T in the following theorem. un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript

Theorem 1 (Consistency) . Suppose Assumptions 1 and 2 hold. Then, for all t ∈ T , sup y ∈ Θ | ˆ M t,n ( y ) − M t,n ( y ) | → P , and ( (cid:98) s t , (cid:98) S t ) → P ( s t , S t ) as n → ∞ . Our proof of consistency accounts for the propagation of the estimation error of ( (cid:98) s t +1 , (cid:98) S t +1 )into the construction of ( (cid:98) s t , (cid:98) S t ). We use telescoping to properly control the eﬀect of this errorpropagation as detailed in (EC.2.1).

5. Asymptotic Normality: Guiding Idea and the Terminal Period

The joint asymptotic normality of { ( (cid:98) s t , (cid:98) S t ) } t ∈T is the key to quantifying the uncertainty in ourestimation. To establish asymptotic normality, it is customary to convert to near-zero estimatorsand then work in the Z -estimator framework. In this section, we ﬁrst show that (cid:98) S t is a near-zeroof the right derivative of y (cid:55)→ ˆ M t,n ( y ). We then present the guiding idea of our method for dealingwith the error propagation and completing the asymptotic analysis. As a by-product of this guidingidea, we immediately obtain the asymptotic normality of the parameters ( (cid:98) s T , (cid:98) S T ) for the terminalperiod. (cid:98) S t to a Near-Zero Estimator The estimated order-up-to level (cid:98) S t minimizes the random criterion function y (cid:55)→ ˆ M t,n ( y ) in (2.9),this is known as M -estimation. Our asymptotic analysis is based on examining the derivative ofthis random criterion function and the set of zeroes of the derivative (this is exactly the ﬁrst-orderoptimality condition for the minimizer of ˆ M t,n ). However, y (cid:55)→ ˆ M t,n ( y ) is nonsmooth because of thenonsmooth cost function C t ( y, d ) and so we cannot use the classical derivative.To this end, let ˆ M rt,n ( y ) and ˆ M lt,n ( y ) denote the right and left derivatives of y → ˆ M t,n ( y ), respec-tively. These derivatives are explicitly:ˆ M rt,n ( y ) = 1 n n (cid:88) i =1 (cid:2) ( b t + h t ) I ( d it ≤ y ) − b t (cid:3) + 1 n n (cid:88) i =1 ˆ M rt +1 ,n ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) , (5.1)ˆ M lt,n ( y ) = 1 n n (cid:88) i =1 (cid:2) ( b t + h t ) I ( d it < y ) − b t (cid:3) + 1 n n (cid:88) i =1 ˆ M lt +1 ,n ( y − d it ) I ( y − d it > (cid:98) s t +1 ) . Also let m rt,y ( d ) be the right derivative of y (cid:55)→ m t,y ( d ) in (4.1) with respect to y . For the terminalperiod, it is m rT,y ( d ) = ( b T + h T ) I ( d ≤ y ) − b T . For t ∈ T , it is m rt,y ( d ) = ( b t + h t ) I ( d ≤ y ) − b t + M (cid:48) t +1 ( y − d ) I ( y − d ≥ s t +1 ) . (5.2)Deﬁne M rt,n ( y ) (cid:44) n (cid:80) ni =1 m rt,y ( d it ) to be the right derivative of y (cid:55)→ M t,n ( y ). Under Assumption 1,the dominated convergence theorem applies to show that y (cid:55)→ M t ( y ) is diﬀerentiable (Levi et al.2007), and that the derivative is M (cid:48) t ( y ) = E m rt,y ( D t ) , t ∈ T . (5.3) un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript Since S t is the global minimum of the diﬀerentiable function M t , we must have M (cid:48) t ( S t ) = 0.The next lemma bounds the diﬀerence between the left and right derivatives of ˆ M t,n ( y ), andshows that the estimator (cid:98) S t is a near-zero of the equation ˆ M rt,n ( y ) = 0. Lemma 1.

Suppose Assumptions 1 and 2 hold, then for all t ∈ T :(i) There exist constants c t and B t such that | ˆ M rt,n ( y ) − ˆ M lt,n ( y ) | ≤ c t /n and | ˆ M rt,n ( y ) | ≤ B t forall y ∈ Θ , P -almost surely.(ii) The estimator (cid:98) S t (deﬁned as the minimizer of y (cid:55)→ ˆ M t,n ( y ) ) satisﬁes ˆ M rt,n ( (cid:98) S t ) = O P ( n − ) . Based on Lemma 1, we can understand (cid:98) S t as a “near zero” estimator in the sense that (cid:98) S t is also asolution of the equation ˆ M rt,n ( y ) = 0, up to a remainder term of order o P ( n − / ). We ﬁrst consider estimation of (cid:98) S t . As mentioned previously, two sources of error propagate back-wards in time from period t + 1 to the random criterion function y (cid:55)→ ˆ M t,n ( y ) used to calculate (cid:98) S t via (2.9). These are the estimation error for (cid:98) s t +1 with respect to s t +1 , and the approximation errorfor ˆ M t +1 ,n with respect to M t +1 . At ﬁrst, we will ignore these two sources of error by substitutingtheir true values into the random criterion function, and then we will use empirical process theoryto establish uniform convergence of this simpliﬁed random criterion function. This lets us do aTaylor expansion of the limiting criterion function, i.e., the expectation of the random criterionfunction, as summarized in Lemma 2 below. Then, we can isolate the contribution of the data { d it } ni =1 from period t to √ n (cid:0) (cid:98) S t − S t ) (see the ﬁrst term on the RHS of the display in Lemma 2).For the next result, recall that M rt,n ( y ) is the empirical mean of m rt,y ( D t ) in (5.2), and M (cid:48) t ( y ) isgiven in (5.3). Lemma 2.

Consider problem (2.2) with T periods. Suppose Assumptions 1 and 2 hold, and M t istwice diﬀerentiable at S t for all t ∈ T . Then √ n M (cid:48)(cid:48) t ( S t ) (cid:16) (cid:98) S t − S t (cid:17) = −√ n (cid:0) M rt,n ( S t ) − M (cid:48) t ( S t ) (cid:1) + √ n (cid:0) M rt,n ( (cid:98) S t ) − ˆ M rt,n ( (cid:98) S t ) (cid:1) + o P (cid:16) √ n ( (cid:98) S t − S t ) (cid:17) . For the terminal period with t = T , Lemma 2 immediately yields the asymptotic distribution of (cid:98) S T because there is no error propagation to consider.In general, the asymptotics of the second term √ n (cid:0) M rt,n ( (cid:98) S t ) − ˆ M rt,n ( (cid:98) S t ) (cid:1) on the RHS of the abovedisplay depend on the error propagation. With proper telescoping, the contributions from the twosources of errors can be separated. The contribution from (cid:98) s t +1 can be analyzed through empiricalprocess theory by establishing uniform convergence over a neighborhood of s t +1 . The contributionfrom ˆ M rt +1 ,n corresponds to a U -process because the argument of ˆ M rt +1 ,n in (5.1) depends on d t . un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript

In our analysis, we need uniform convergence of this U -process because it involves the estimatedparameter (cid:98) S t .The asymptotic analysis for (cid:98) s t is similar. Since (cid:98) s t is estimated after (cid:98) S t , three analogous sourcesof error propagate backwards as seen in (2.10): (i) the estimation error for (cid:98) s t +1 with respect to s t +1 ; (ii) the approximation error for ˆ M t,n with respect to M t ; and (iii) the estimation error for (cid:98) S t with respect to S t . However, the third source of error turns out to be negligible (it is o P (1))because M (cid:48) t ( S t ) = 0, and so there are eﬀectively only two sources of error. Similar to the analysisfor (cid:98) S t , we ﬁrst ignore these two sources of error and use empirical process theory to isolate thecontribution of the data from period t to √ n ( (cid:98) s t − s t ), as given in the ﬁrst two terms on the RHS ofthe display in the following lemma. Recall that M t,n ( y ) is the empirical mean of m t,y ( D t ) in (4.1). Lemma 3.

Consider problem (2.2) with setup costs K t > . Suppose Assumptions 1 and 2 hold,and √ n ( (cid:98) S t − S t ) = O P (1) . Then for any t ∈ T , √ n M (cid:48) t ( s t ) ( (cid:98) s t − s t ) = √ n (cid:0) M t,n ( S t ) − M t ( S t ) (cid:1) − √ n (cid:0) M t,n ( s t ) − M t ( s t ) (cid:1) + √ n (cid:16) ˆ M t,n ( (cid:98) S t ) − M t,n ( (cid:98) S t ) (cid:17) − √ n (cid:16) ˆ M t,n ( (cid:98) s t ) − M t,n ( (cid:98) s t ) (cid:17) + o P (cid:0) √ n ( (cid:98) s t − s t ) (cid:1) . The structure ˆ M t,n − M t,n appears in the third and fourth terms on the RHS of the above display.This structure represents the error propagation. For the terminal period, this lemma automaticallygives the asymptotic distribution of (cid:98) s T because ˆ M T,n = M T,n . For t ∈ T , we need to quantify thediﬀerence between ˆ M t,n and M t,n to obtain the asymptotic distribution. Just like we analyzedˆ M rt,n − M t,n for (cid:98) S t in Lemma 2, we can decompose this diﬀerence into two parts corresponding tothe two sources of error. One of them can be analyzed with empirical process theory, and the otherwith U -process theory. Our guiding idea readily yields the asymptotics for the terminal period. Equivalently, it yields theasymptotics for the single period problem with T = 1, which has been studied extensively, see e.g.Scarf (1960), Zipkin (2000), Huh et al. (2009b) and Levi et al. (2007, 2015).We will use the subscript T throughout this subsection since all of the results here apply toany terminal period. Our ﬁrst step is to derive the inﬂuence functions of (cid:98) S T and (cid:98) s T , respectively.In view of Lemmas 2 and 3, this step is straightforward. Recall that m T,S T ( d ) and m T,s T ( d ) aredeﬁned in (4.1). Theorem 2.

Suppose Assumptions 1 and 2 hold, and f T ( S T ) > . Then √ n ( (cid:98) S T − S T ) = −√ n M rT,n ( S T ) M (cid:48)(cid:48) T ( S T ) + o P (1) = 1 √ n n (cid:88) i =1 f T ( S T ) (cid:18) b T b T + h T − I ( d iT ≤ S T ) (cid:19) + o P (1) , un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript and √ n ( (cid:98) s T − s T ) = √ n M (cid:48) T ( s T ) (cid:0) M T,n ( S T ) − M T ( S T ) (cid:1) − √ n M (cid:48) T ( s T ) (cid:0) M T,n ( s T ) − M T ( s T ) (cid:1) + o P (1)= 1 √ n n (cid:88) i =1 M (cid:48) T ( s T ) (cid:16) (cid:8) m T,S T ( d iT ) − M T ( S T ) (cid:9) − (cid:8) m T,s T ( d iT ) − M T ( s T ) (cid:9) (cid:17) + o P (1) . In the above display, recall that M (cid:48) T ( s T ) = (cid:0) b T + h T (cid:1) F T ( s T ) − b T , M T ( S T ) = E m T,S T ( D T ), and M T ( s T ) = E m T,s T ( D T ). The inﬂuence functions for (cid:98) S T and (cid:98) s T from the above theorem immediatelyyield the joint asymptotic normality of ( (cid:98) s T , (cid:98) S T ). Theorem 3 below summarizes this result, where (cid:32) denotes weak convergence. Theorem 3.

Suppose Assumptions 1 and 2 hold, and f T ( S T ) > . Then √ n (cid:0)(cid:98) s T − s T , (cid:98) S T − S T (cid:1) (cid:32) N (0 , Σ ) as n → ∞ , where the covariance matrix Σ ∈ R × is given entry-wise by Σ (1 ,

1) = (cid:16)(cid:0) b T + h T (cid:1) F T ( s T ) − b T (cid:17) − Var [ m T,S T ( D T ) − m T,s T ( D T )] , Σ (2 ,

2) = ( f T ( S T )) − Var I ( S T − D T ≥

0) = b T h T ( b T + h T ) ( f T ( S T )) , Σ (1 ,

2) = ( f T ( S T )) − (cid:16)(cid:0) b T + h T (cid:1) F T ( s T ) − b T (cid:17) − Cov ( I ( S T − D T ≥ , m T,S T ( D T ) − m T,s T ( D T )) . Furthermore, if there exists a density estimator ˆ f T for f T such that ˆ f T ( (cid:98) S T ) → P f T ( S T ) as n → ∞ ,then Σ can be consistently estimated by ˆΣ (1 ,

1) = (cid:16)(cid:0) b T + h T (cid:1) ˆ F T ( (cid:98) s T ) − b T (cid:17) − × n n (cid:88) i =1 (cid:104) m T, (cid:98) S T ( d iT ) − m T, (cid:98) s T ( d iT ) + K T (cid:105) , ˆΣ (2 ,

2) = b T h T ( b T + h T ) − (cid:16) ˆ f T ( (cid:98) S T ) (cid:17) − , ˆΣ (1 ,

2) = (cid:16) ˆ f T ( (cid:98) S T ) (cid:17) − (cid:16)(cid:0) b T + h T (cid:1) ˆ F T ( (cid:98) s T ) − b T (cid:17) − × n n (cid:88) i =1 (cid:18) I ( d iT ≤ (cid:98) S T ) − b T b T + h T (cid:19) (cid:104) m T, (cid:98) S T ( d iT ) − m T, (cid:98) s T ( d iT ) + K T (cid:105) , where ˆ F T ( y ) (cid:44) n (cid:80) ni =1 I ( d iT ≤ y ) is the empirical CDF of D T . Consistency of the variance estimator requires a consistent estimator of the density f T at S T ,which is a mild requirement. Most commonly-used density estimators, such as the kernel densityestimator (van der Vaart 1998, Chapter 24.2), satisfy this condition provided that f T is continuousin a neighbourhood of S T .

6. Asymptotic Normality: The Two-Period Problem

This section follows the guiding idea in Section 5.2 to establish asymptotic normality for theestimated parameters { ( (cid:98) s t , (cid:98) S t } t =1 in the two-period problem. This is the ﬁrst time we encounter un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript error propagation, and the simple structure of the two-period setting is ideal for demonstratinghow we handle the two sources of error propagation in the general T -period problem. The generalproblem is more complicated because the estimated policy is computed inductively with no closed-form, but the principle for handling the error propagation is the same as for the two-period problem.We carry this principle forward to the general T -period problem in the next section.The optimal policy for the two-period problem is captured by four parameters: ( s , S ) and( s , S ). The inﬂuence functions of (cid:98) S and (cid:98) s have already been derived in Theorem 2 (since theseare the terminal period order-up-to level and threshold). It remains to derive the inﬂuence functionsof (cid:98) S and (cid:98) s . (cid:98) S We ﬁrst study the asymptotics of (cid:98) S , in particular we want to derive the inﬂuence function of thisestimator. By applying Lemma 2 with t = 1, we see √ n M (cid:48)(cid:48) ( S ) (cid:0) (cid:98) S − S (cid:1) = −√ n (cid:0) M r ,n ( S ) − M (cid:48) ( S ) (cid:1) + √ n (cid:0) M r ,n ( (cid:98) S ) − ˆ M r ,n ( (cid:98) S ) (cid:1) + o P (cid:0) √ n ( (cid:98) S − S ) (cid:1) . Note that the ﬁrst term on the RHS of the above display can be expressed as an i.i.d. sum − √ n (cid:80) ni =1 m r ,S ( d i ) with mean zero (where m r ,S is deﬁned in (5.2)). By the CLT, √ n (cid:0) M r ,n ( S ) − M (cid:48) ( S ) (cid:1) is asymptotically normal.The second term √ n (cid:0) M r ,n ( (cid:98) S ) − ˆ M r ,n ( (cid:98) S ) (cid:1) represents error propagation from estimation inperiod t = 2 so its analysis is more involved. Recall that (cid:98) S is the minimizer of y (cid:55)→ √ n ˆ M ,n ( y ). Itis also a near-zero of the estimation equation y (cid:55)→ √ n ˆ M r ,n ( y ) which is explicitly:1 √ n n (cid:88) i =1 (cid:16) ( b + h ) I ( y − d i ≥ − b + ˆ M r ,n ( y − d i ) I ( y − d i ≥ (cid:98) s ) (cid:17) = 0 . The two sources of the error propagation from estimation in period t = 2 appear in the LHS of theabove display. The ﬁrst comes from using the estimated (cid:98) s in place of s , and the second comes fromapproximating M (cid:48) ( y ) with ˆ M r ,n ( y ). If d i did not appear in the argument inside ˆ M r ,n , then ˆ M r ,n would just be an i.i.d. sum of functions of d j (where the dependence on d j is implicit). However,because d i appears inside this argument, ˆ M r ,n ( y − d i ) is a function of both d i and d j .From (2.8) and (2.10), we have M r ,n ( (cid:98) S ) = 1 n n (cid:88) i =1 (cid:16) ( b + h ) I ( (cid:98) S − d i ≥ − b + M (cid:48) ( (cid:98) S − d i ) I ( (cid:98) S − d i ≥ s ) (cid:17) , ˆ M r ,n ( (cid:98) S ) = 1 n n (cid:88) i =1 (cid:16) ( b + h ) I ( (cid:98) S − d i ≥ − b + ˆ M r ,n ( (cid:98) S − d i ) I ( (cid:98) S − d i ≥ (cid:98) s ) (cid:17) . un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript Taking the diﬀerence of these two displays shows that √ n (cid:16) ˆ M r ,n ( (cid:98) S ) − M r ,n ( (cid:98) S ) (cid:17) = 1 √ n n (cid:88) i =1 (cid:16) ˆ M r ,n ( (cid:98) S − d i ) I ( (cid:98) S − d i ≥ (cid:98) s ) − M (cid:48) ( (cid:98) S − d i ) I ( (cid:98) S − d i ≥ s ) (cid:17) . The above display helps explain why the term √ n (cid:16) ˆ M r ,n ( (cid:98) S ) − M r ,n ( (cid:98) S ) (cid:17) is non-negligible. TheRHS shows that the asymptotic distribution depends on (cid:98) S , (cid:98) s , ˆ M r ,n , and the observed demanddata in period t = 1, while ˆ M r ,n is also implicitly a function of the observed demand data in period t = 2. To decouple the contributions of these components to the asymptotic distribution of theRHS, ﬁrst deﬁne the following function˜ M r ,n ( y ) (cid:44) n n (cid:88) i =1 (cid:16) ( b + h ) I ( y − d i ≥ − b + ˆ M r ,n ( y − d i ) I ( y − d i ≥ s ) (cid:17) , (6.1)where we replace the estimated (cid:98) s with the exact s in ˆ M r ,n ( y ). Using ˜ M r ,n ( (cid:98) S ) to telescope, wecan then decompose √ n (cid:110) ˆ M r ,n ( (cid:98) S ) − M r ,n ( (cid:98) S ) (cid:111) = √ n (cid:110) ˆ M r ,n ( (cid:98) S ) − ˜ M r ,n ( (cid:98) S ) (cid:111) + √ n (cid:110) ˜ M r ,n ( (cid:98) S ) − M r ,n ( (cid:98) S ) (cid:111) . (6.2)The ﬁrst term on the RHS of the above display represents the propagation of estimation error fromusing the estimated (cid:98) s . Using standard empirical process theory and proper telescoping, it can beshown that its asymptotic distribution is closely related to that of (cid:98) s itself. This observation issummarized in the following lemma.Throughout the manuscript, we use R n ( y ) to represent a generic remainder term which maychange from line to line in the presentation. Lemma 4.

Consider problem (2.2) for T = 2 . Suppose Assumptions 1 and 2 hold, and f ( S ) > .Then, √ n (cid:16) ˆ M r ,n ( y ) − ˜ M r ,n ( y ) (cid:17) = 1 √ n n (cid:88) i =1 M (cid:48) ( s ) (cid:0) I ( y − d i ≥ (cid:98) s ) − I ( y − d i ≥ s ) (cid:1) + R n ( y ) , where sup y ∈ Θ | R n ( y ) | = o P (1) . Furthermore, √ n (cid:16) ˆ M r ,n ( (cid:98) S ) − ˜ M r ,n ( (cid:98) S ) (cid:17) = −√ n M (cid:48) ( s ) f ( S − s )( (cid:98) s − s ) + o P (cid:16) √ n ( (cid:98) S − S ) (cid:17) . It is interesting to observe that in the above display, the contribution from (cid:98) S is absorbed by theremainder term, so the only eﬀective contribution comes from (cid:98) s .Next we consider the second term on the RHS of (6.2), which captures the propagation of theapproximation error incurred by substituting M (cid:48) with ˆ M r ,n . Because (cid:98) S is random and variesaround S , we need uniform convergence of √ n (cid:16) ˜ M r ,n ( y ) − M r ,n ( y ) (cid:17) for all y in some neighbourhood un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript of S to get the asymptotics of √ n (cid:16) ˜ M r ,n ( (cid:98) S ) − M r ,n ( (cid:98) S ) (cid:17) . Use (5.1) to write this term in a moreconvenient form as ˆ M r ,n ( y ) = n (cid:80) nj =1 (cid:0) ( b + h ) I ( y − d j ≥ − b (cid:1) . Note also that M (cid:48) ( y ) = ( b + h ) F ( y ) − b (by directly computing the expectation in (5.3)). Then, we have √ n (cid:16) ˜ M r ,n ( (cid:98) S ) − M r ,n ( (cid:98) S ) (cid:17) = 1 n √ n n (cid:88) i =1 n (cid:88) j =1 ( b + h ) (cid:110) I ( (cid:98) S − d i − d j ≥ − F ( (cid:98) S − d i ) (cid:111) I ( (cid:98) S − d i ≥ s ) . This display explicitly shows the contribution from both the demand data { d i } for period t = 1and the demand data { d j } for period t = 2. We see right away that the RHS of the above displayis not an i.i.d. sum, i.e., it is not a summation of g ( d i , d i ) over i = 1 , , · · · , n for some function g ,rather it is a double summation over { d i } and { d j } .We cannot apply standard empirical process theory to analyze the above display, but we can apply two-sample U -processes as follows. Deﬁne a family of functions g y : R (cid:55)→ R indexed by y via g y ( d , d ) (cid:44) ( b + h ) (cid:8) I ( y − d − d ≥ − F ( y − d ) (cid:9) I ( y − d ≥ s ) , (6.3)and deﬁne the corresponding function class F (cid:44) { ( d , d ) (cid:55)→ g y ( d , d ) : y ∈ Θ } . (6.4)This class of functions is uniformly bounded by b + h . By conditioning on D and taking expec-tation with respect to D , it is readily seen that E g y ( D , D ) = 0 for all y ∈ Θ.Let g (1) y ( d ) (cid:44) E g y ( d , D ) and g (2) y ( d ) (cid:44) E g y ( D , d ) = ( b + h ) (cid:8) F ( y − d ∨ s ) − E F ( y − D ∨ s ) (cid:9) , (6.5)and deﬁne the function class F (2) (cid:44) { g (2) y ( d ) : y ∈ Θ } . Because of the special structure of g y , wehave g (1) y ( d ) = 0 for all d and y . Then we can deﬁne the two-sample U -process indexed by kernels g y ∈ F via U n ( y ) (cid:44) n n (cid:88) i =1 n (cid:88) j =1 g y ( d i , d j ) , (6.6)which has mean E U n ( y ) = 0. Its H´ajek projection is U H n ( y ) (cid:44) n n (cid:88) i =1 g (1) y ( d i ) + 1 n n (cid:88) j =1 g (2) y ( d j ) = 1 n n (cid:88) j =1 g (2) y ( d j ) . (6.7)With these deﬁnitions, we can now see that √ n (cid:0) ˜ M r ,n ( (cid:98) S ) − M r ,n ( (cid:98) S ) (cid:1) is a normalized U -statisticwith kernel g (cid:98) S ( D , D ). Speciﬁcally, we have √ n (cid:0) ˜ M r ,n ( (cid:98) S ) − M r ,n ( (cid:98) S ) (cid:1) = √ nU n ( (cid:98) S ). If y is ﬁxed,then √ nU H n ( y ) is asymptotically normal and U n ( y ) − U H n ( y ) = o P ( n − / ) (see Chapter 12 in van derVaart (1998)).To control the contribution of the variation of (cid:98) S to √ nU n ( (cid:98) S ), we need to strengthen theconvergence U n ( y ) − U H n ( y ) = o P ( n − / ) to be uniform over F . To this end, we will show that thefunction classes deﬁned above are Euclidean to get the necessary uniform convergence. un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript Lemma 5. (i)

Both F and F (2) are Euclidean relative to the envelope b + h . (ii) The sequence n / (cid:0) U n ( y ) − U H n ( y ) (cid:1) converges in probability to zero uniformly in y ∈ Θ , i.e., sup y ∈ Θ (cid:12)(cid:12) U n ( y ) − U H n ( y ) (cid:12)(cid:12) = o P ( n − / ) . Based on Lemma 5, we can now rewrite: √ n (cid:16) ˜ M r ,n ( (cid:98) S ) − M r ,n ( (cid:98) S ) (cid:17) = 1 √ n n (cid:88) j =1 g (2) (cid:98) S ( d j ) + o P (1) . (6.8)The RHS above is an i.i.d. sum indexed by the estimated parameter (cid:98) S . The structure of this i.i.d.sum lets us use empirical process theory to replace (cid:98) S with S and obtain approximation error o P (1). Lemma 6.

Consider problem (2.2) for T = 2 , suppose Assumptions 1 and 2 hold and M (cid:48)(cid:48) ( S ) (cid:54) = 0 .Then √ nU H n ( (cid:98) S ) = √ nU H n ( S ) + o P (1) , i.e., √ n (cid:16) ˜ M r ,n ( (cid:98) S ) − M r ,n ( (cid:98) S ) (cid:17) = 1 √ n n (cid:88) j =1 g (2) S ( d j ) + o P (1) . The RHS of the above display, as an i.i.d. sum, is asymptotically normal by the CLT. WithLemmas 2, 4, and 6 now in hand, we can obtain the inﬂuence function of (cid:98) S , as given in thefollowing theorem. Recall that m ,s ( d ) and m ,S ( d ) are deﬁned in (4.1), and m r ,S ( d ) is deﬁnedin (5.2). Theorem 4.

Consider problem (2.2) for T = 2 . Suppose Assumptions 1 and 2 hold, and both M (cid:48)(cid:48) ( S ) and M (cid:48)(cid:48) ( S ) exist and are nonzero. Then (cid:98) S is asymptotically linear with √ n ( (cid:98) S − S ) = 1 √ n n (cid:88) i =1 M (cid:48)(cid:48) ( S ) (cid:16) f ( S − s ) (cid:0)(cid:8) m ,S ( d i ) − M ( S ) (cid:9) − (cid:8) m ,s ( d i ) − M ( s ) (cid:9)(cid:1) − m r ,S ( d i ) − g (2) S ( d i ) (cid:17) + o P (1) . The condition that M (cid:48)(cid:48) ( S ) (cid:54) = 0 of the above theorem is reasonable. For instance, when K = 0 weknow that y (cid:55)→ M ( y ) is strongly convex under Assumption 1, which implies M (cid:48)(cid:48) ( S ) > (cid:98) s Next we turn to the asymptotics of (cid:98) s . This analysis is based on the decomposition in Lemma 3,which parallels the way the analysis for (cid:98) S is based on the decomposition in Lemma 2. We applyLemma 3 with t = 1 to get √ n M (cid:48) ( s ) ( (cid:98) s − s ) = √ n (cid:0) M ,n ( S ) − M ( S ) (cid:1) − √ n (cid:0) M ,n ( s ) − M ( s ) (cid:1) + √ n (cid:16) ˆ M ,n ( (cid:98) S ) − M ,n ( (cid:98) S ) (cid:17) − √ n (cid:16) ˆ M ,n ( (cid:98) s ) − M ,n ( (cid:98) s ) (cid:17) + o P (cid:0) √ n ( (cid:98) s − s ) (cid:1) . un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript

The ﬁrst two terms on the RHS of the above display are together an i.i.d. sum1 √ n n (cid:88) i =1 (cid:110) (cid:16) m ,S ( d i ) − m ,s ( d i ) (cid:17) − (cid:16) M ( S ) − M ( s ) (cid:17)(cid:111) , which is asymptotically normal by the CLT (recall m ,y ( d ) is deﬁned in (4.1)).The expression ˆ M ,n − M ,n appears in the third and fourth terms of the above display. We recallˆ M ,n ( y ) = 1 n n (cid:88) i =1 (cid:110) C ( y, d i ) + ˆ M ,n ( (cid:98) s ) I ( y − d i < (cid:98) s ) + ˆ M ,n ( y − d i ) I ( y − d i ≥ (cid:98) s ) (cid:111) , M ,n ( y ) = 1 n n (cid:88) i =1 (cid:8) C ( y, d i ) + M ( s ) I ( y − d i < s ) + M ( y − d i ) I ( y − d i ≥ s ) (cid:9) , as deﬁned in (2.8) and (4.1). It is readily seen that √ n ( ˆ M ,n ( y ) − M ,n ( y )) is aﬀected by theapproximation error of (cid:98) s for s and the approximation error of ˆ M ,n for M . In parallel to (6.1), wedecompose the contributions of these two sources of error with the help of the following intermediatefunction˜ M ,n ( y ) = 1 n n (cid:88) i =1 (cid:110) C ( y, d i ) + ˆ M ,n ( s ) I ( y − d i < s ) + ˆ M ,n ( y − d i ) I ( y − d i ≥ s ) (cid:111) , (6.9)which is obtained by replacing (cid:98) s with s in ˆ M ,n . We can then decompose √ n ( ˆ M ,n ( y ) − M ,n ( y ))into √ n (cid:110) ˆ M ,n ( y ) − M ,n ( y ) (cid:111) = √ n (cid:110) ˆ M ,n ( y ) − ˜ M ,n ( y ) (cid:111) + √ n (cid:110) ˜ M ,n ( y ) − M ,n ( y ) (cid:111) . (6.10)The ﬁrst term on the RHS above represents the error propagation from using (cid:98) s . As we did in theanalysis for (cid:98) S , we can use empirical process theory to show that it converges uniformly over y ∈ Θ.This enables us to account for the variation in (cid:98) S and (cid:98) s in the asymptotics of √ n (cid:0) ˆ M ,n ( (cid:98) S ) − ˜ M ,n ( (cid:98) S ) (cid:1) and √ n (cid:0) ˆ M ,n ( (cid:98) s ) − ˜ M ,n ( (cid:98) s ) (cid:1) .The second term on the RHS of (6.10) represents the propagation of the approximation errorfrom substituting M with ˆ M ,n , which can itself be further decomposed into two parts. The ﬁrstpart comes from the sum (cid:80) ni =1 (cid:8) ˆ M ,n ( s ) − M ( s ) (cid:9) I ( y − d i < s ), and the other comes from the sum (cid:80) ni =1 (cid:8) ˆ M ,n ( y − d i ) − M ( y − d i ) (cid:9) I ( y − d i ≥ s ). By telescoping with (cid:8) ˆ M ,n ( s ) − M ( s ) (cid:9) EI ( y − D < s ) and using the fact that empirical processes are tight, the ﬁrst part can be written as theproduct of EI ( y − D < s ) and an i.i.d. sum over { d i } .The second part is more complicated because d i appears inside the argument of ˆ M ,n . Thisstructure is similar to that of √ n (cid:0) ˜ M r ,n ( (cid:98) S ) − M r ,n ( (cid:98) S ) (cid:1) . We will again construct an appropriatetwo-sample U -process to complete the analysis. We pause to summarize our observations so far. un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript Lemma 7.

Consider problem (2.2) for T = 2 . Suppose the conditions of Theorem 4 hold, then √ n (cid:110) ˆ M ,n ( y ) − ˜ M ,n ( y ) (cid:111) = √ n M (cid:48) ( s )( (cid:98) s − s ) { − F ( y − s ) } + R n ( y ) , √ n n (cid:88) i =1 (cid:8) ˆ M ,n ( s ) − M ( s ) (cid:9) I ( y − d i < s ) = √ n ( M ,n ( s ) − M ( s )) { − F ( y − s ) } + R n ( y ) , where sup y ∈ Θ | R n ( y ) | = o P (1) . The RHS of the ﬁrst equation above is a function of ( (cid:98) s − s ). This equation captures the errorpropagation from the approximation of (cid:98) s to s . Similarly, the RHS of the second equation isrelated to the diﬀerence M ,n ( s ) − M ( s ).We now move to discuss the second part of the decomposition of the second term on the RHSof (6.10). We can write this quantity as1 √ n n (cid:88) i =1 (cid:16) ˆ M ,n ( y − d i ) − M ( y − d i ) (cid:17) I ( y − d i ≥ s ) = 1 n √ n (cid:88) i,j (cid:16) m ,y − d i ( d j ) − M ( y − d i ) (cid:17) I ( y − d i ≥ s ) , since ˆ M ,n ( y ) = n (cid:80) ni =1 m ,y ( d i ). Let ˜ g y ( d , d ) (cid:44) ( m ,y − d ( d ) − M ( y − d )) I ( y − d ≥ s ) anddeﬁne the corresponding function class ˜ F (cid:44) { ˜ g y : y ∈ Θ } . Also let˜ g (2) y ( d ) (cid:44) E ˜ g y ( D , d ) = E ( m ,y − D ( d ) − M ( y − D )) I ( y − D ≥ s ) (6.11)and deﬁne the corresponding function class ˜ F (2) (cid:44) { ˜ g (2) y : y ∈ Θ } . It is easy to check that E ˜ g y ( D , D ) = 0 and E ˜ g (2) y ( D ) = 0 for all y ∈ Θ, and that E ˜ g y ( d , D ) = 0 for all ( d , y ).With these deﬁnitions, we can see that the above display is a two-sample U -process { ˜ U n } n ≥ with kernel ( d , d ) (cid:55)→ ˜ g y ( d , d ) deﬁned by:˜ U n ( y ) (cid:44) n n (cid:88) i =1 n (cid:88) j =1 ˜ g y ( d i , d j ) . (6.12)The H´ajek projection of ˜ U n ( y ) is ˜ U H n ( y ) (cid:44) n n (cid:88) j =1 ˜ g (2) y ( d j ) . (6.13)Our task now boils down to studying the asymptotics of ˜ U n ( (cid:98) S ) and ˜ U n ( (cid:98) s ). To account for thevariation of (cid:98) S and (cid:98) s , we need to establish uniform convergence of ˜ U n ( y ) to its H´ajek projec-tion ˜ U H n ( y ) over y ∈ Θ. We will show that both ˜ F and ˜ F (2) are Euclidean, as in Lemma 5, anduniform convergence will follow. This uniform convergence then enables the use of empirical pro-cess theory to replace (cid:98) s and (cid:98) S with their true values s and S in the i.i.d. structure of ˜ U H n ( y ),as in Lemma 6. These two results, together with Lemma 7, lead to the ﬁnal representation of √ n (cid:16) ˆ M ,n ( y ) − M ,n ( y ) (cid:17) summarized below. un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript

Lemma 8.

Consider problem (2.2) for T = 2 . Suppose Assumptions 1 and 2 hold, and both M (cid:48)(cid:48) ( S ) and M (cid:48)(cid:48) ( S ) exist and are nonzero. Then: (i) Both ˜ F and ˜ F (2) are uniformly bounded Euclidean classes. (ii) sup y ∈ Θ (cid:12)(cid:12)(cid:12) ˜ U n ( y ) − ˜ U H n ( y ) (cid:12)(cid:12)(cid:12) = o P ( n − / ) . (iii) Both n / (cid:8) ˜ U H n ( (cid:98) S ) − ˜ U H n ( S ) (cid:9) and n / (cid:8) ˜ U H n ( (cid:98) s ) − ˜ U H n ( s ) (cid:9) are o P (1) . The inﬂuence function of (cid:98) s can be readily obtained by combining Lemmas 7 and 8. Recall d (cid:55)→ ˜ g (2) S ( d ) and d (cid:55)→ ˜ g (2) s ( d ) are deﬁned in (6.11). Theorem 5.

Consider problem (2.2) for T = 2 . Suppose Assumptions 1 and 2 hold, and both M (cid:48)(cid:48) ( S ) and M (cid:48)(cid:48) ( S ) exist and are nonzero, and M (cid:48) ( s ) (cid:54) = 0 . Then √ n ( (cid:98) s − s ) = 1 √ n n (cid:88) i =1 M (cid:48) ( s ) (cid:16)(cid:8) F ( s − s ) − F ( S − s ) (cid:9) (cid:8) m ,S ( d i ) − M ( S ) (cid:9) + { ˜ g (2) S ( d i ) − ˜ g (2) s ( d i ) } + (cid:8) m ,S ( d i ) − m ,s ( d i ) − ( M ( S ) − M ( s )) (cid:9) (cid:17) + o P (1) . We now demonstrate joint asymptotic normality of all four parameters ( (cid:98) s , (cid:98) S ) and ( (cid:98) s , (cid:98) S ) bycombining Theorem 4 (on (cid:98) S ) and Theorem 5 (on the asymptotic normality of (cid:98) s ) with Theorem 2(on the asymptotic normality of ( (cid:98) s , (cid:98) S )). We deﬁne the following inﬂuence functions: ϕ s ( D , D ) (cid:44) M (cid:48) ( s ) (cid:110)(cid:16) F ( s − s ) − F ( S − s ) (cid:17) ( C ( S , D ) − M ( S )) + ˜ g (2) S ( D ) − ˜ g (2) s ( D ) (cid:111) + 1 M (cid:48) ( s ) { m ,S ( D ) − m ,s ( D ) + K } ,ϕ S ( D , D ) (cid:44) M (cid:48)(cid:48) ( S ) (cid:110) − m r ,S ( D ) + f ( S − s ) ( C ( S , D ) − C ( s , D ) + K ) − g (2) S ( D ) (cid:111) ,ϕ s ( D , D ) (cid:44) M (cid:48) ( s ) (cid:8) m ,S ( D ) − m ,s ( D ) + K (cid:9) ,ϕ S ( D , D ) (cid:44) (cid:0) f ( S ) (cid:1) − (cid:18) I ( S − D ≥ − b b + h (cid:19) . Theorem 6.

Consider problem (2.2) for T = 2 . Suppose Assumptions 1 and 2 hold, and that M (cid:48)(cid:48) ( S ) , M (cid:48)(cid:48) ( S ) , and M (cid:48) ( s ) all exist and are all nonzero. Then √ n (cid:16)(cid:98) s − s , (cid:98) S − S , (cid:98) s − s , (cid:98) S − S (cid:17) (cid:32) N (0 , Σ ) as n → ∞ for covariance matrix Σ ∈ R × where Σ ( i, j ) is the covariance of the i th and j th inﬂu-ence functions in { ϕ s , ϕ S , ϕ s , ϕ S } . Under mild conditions, the covariance matrix can be consistentlyestimated (see EC.3.8 and EC.3.14). un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript Theorem 6, together with the consistent estimator of the covariance matrix, allows us to conducthypothesis testing on the structure of the optimal policy. For example, we can test the null hypoth-esis s < s , S ) = ( s , S ), i.e., if the optimal policy should vary across periods or remainconstant. A time homogeneous ( s, S ) − policy could be more accessible to practitioners and easierto implement.The above procedure can also be used to establish the asymptotic normality of the optimalexpected cost itself. In the two-period problem with starting inventory level x < s , by (2.6) theoptimal expected cost is C ∗ = K + M ( S ). It is natural to estimate C ∗ by replacing M ( S ) withˆ M ,n ( (cid:98) S ) in (2.8) to obtain the estimatorˆ C ∗ = K + 1 n n (cid:88) i =1 C ( (cid:98) S , d i ) + 1 n (cid:88) i,j (cid:104)(cid:16) C ( (cid:98) S , d j ) + K (cid:17) I ( (cid:98) S − d i < (cid:98) s ) + C ( (cid:98) S − d i , d j ) I ( (cid:98) S − d i ≥ (cid:98) s ) (cid:105) . The following theorem gives the asymptotics for the estimator ˆ C ∗ . Theorem 7.

Consider problem (2.2) for T = 2 . Suppose Assumptions 1 and 2 hold, and that M (cid:48)(cid:48) ( S ) , M (cid:48)(cid:48) ( S ) , and M (cid:48) ( s ) all exist and are all nonzero, and that x < s . Then ˆ C ∗ is a consistentestimator of C ∗ . Furthermore, √ n ( ˆ C ∗ − C ∗ ) is asymptotically normal with inﬂuence function ( M ( S ) + K ) I ( S − d < s ) − E (cid:16) ( M ( S ) + K ) I ( S − D < s ) + M ( S − D ) I ( S − D ≥ s ) (cid:17) + M ( S − d ) I ( S − d ≥ s ) + C ( S , d ) − E C ( S , D ) + { − F ( S − s ) } ( C ( S , d ) + K ) + E C ( S − D , d ) I ( S − D ≥ s ) − E (cid:16) ( M ( S ) + K ) I ( S − D < s ) + M ( S − D ) I ( S − D ≥ s ) (cid:17) . Under mild conditions, the asymptotic variance of ˆ C ∗ can be consistently estimated (see EC.3.19). Theorem 7 can be used to construct a conﬁdence interval (CI) for the expected optimal cost,and the variance of the estimator ˆ C ∗ determines the width of the CI. Suppose the retailer has amaximum threshold ¯ C for acceptable expected costs. The CI can be used to test the null hypothesisthat C ∗ > ¯ C (that the optimal expected cost exceeds the threshold). We can use the consistentestimator of the asymptotic variance of ˆ C ∗ to determine the minimum sample size needed to reacha target width of the CI. If the null hypothesis is not rejected, then this system is not proﬁtableenough for the retailer at the desired conﬁdence level.We remark that the above theorem assumes x < s , but this assumption is not compulsory sincethe proof for this theorem naturally extends to settings where x ≥ s . For x ≥ s , the optimalcost is M ( x ) by (2.6), which we can estimate with ˆ M ,n ( x ). The analysis of the asymptotics of √ n ( ˆ M ,n ( x ) − M ( x )) is similar to the above reasoning. un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript

7. Asymptotic Normality: The Multi-Period Problem

In this section, we extend our analysis for the two-period problem to the general T -period problemfor T ≥

3. We ﬁrst discuss the special case without setup costs (where the asymptotic inﬂuencefunctions can be obtained in closed form), and then we extend the discussion to the general casewith setup costs (where the asymptotic inﬂuence functions must be obtained recursively).

The following assumption is in force throughout this subsection.

Assumption 3 (No setup costs) . K t = 0 for all t ∈ T . In a base-stock policy, the retailer should order up to the optimal base-stock level S t wheneverthe starting inventory level satisﬁes x t ≤ S t . Under Assumption 3, a base-stock policy is optimalfor problem (2.2). In period t ∈ T , the (exact) optimal base-stock level S t is found by minimizing y (cid:55)→ M t ( y ) as given in (2.4). This policy is fully determined by the order-up-to stock levels { S t } Tt =1 ,and so we can view it as a special case of the ( s, S ) − policy where s t = S t for all t ∈ T . In this light,the statement of consistency in Theorem 1 and the asymptotic expansion in Lemma 2 still hold inthe present setting.There are several reasons to give special attention to this case before developing the general case.First, a base-stock policy is itself of interest in operations management since it is easy to implementand widely used in practice (Nandakumar and Morton 1993, Zhang et al. 2018). Second, base-stockpolicies have fewer parameters that need to be estimated, compared to general ( s, S )-policies, andthe asymptotic inﬂuence functions for { (cid:98) S t } Tt =1 in the base-stock policy can be written in closedform. In contrast, the asymptotic inﬂuence functions for the general case with setup costs have tobe written recursively (see Theorem 10). Last but not least, elements of our proof of asymptoticnormality for the special case without setup costs carry over to the general case with setup costs.In period t ∈ T , the (estimated) optimal base-stock level (cid:98) S t is found by minimizing y (cid:55)→ ˆ M t,n ( y ),as given in (2.9). We will ﬁnd the inﬂuence functions for { (cid:98) S t } Tt =1 , and then establish the jointasymptotic normality of these parameters. By Lemma 2, this amounts to studying the asymptoticsof √ n (cid:0) ˆ M rt,n ( (cid:98) S t ) − M rt,n ( (cid:98) S t ) (cid:1) where y (cid:55)→ ˆ M rt,n ( y ) is deﬁned in (5.1), M rt,n ( y ) = n (cid:80) ni =1 m rt,y ( d it ), and m rt,y ( · ) is deﬁned in (5.2). To account for the variation in (cid:98) S t , we will establish uniform convergenceof √ n (cid:0) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:1) over y ∈ Θ. We will repeatedly use arguments by induction here, basedon the intuition we developed for the two-period problem. The following lemma serves as a startingpoint for induction, it is a combination of (6.2) and Lemma 4. un Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

Manuscript Lemma 9.

Suppose Assumptions 1 - 3 hold. Fix t ∈ T and suppose that √ n ( (cid:98) S t +1 − S t +1 ) = O P (1) .Then √ n (cid:16) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:17) = 1 √ n n (cid:88) i =1 (cid:16) ˆ M rt +1 ,n ( y − d it ) − M (cid:48) t +1 ( y − d it ) (cid:17) I ( y − d it ≥ S t +1 ) + R n ( y ) , where sup y ∈ Θ | R n ( y ) | = o P (1) . Lemma 9 shows that the asymptotic distribution of y (cid:55)→ ˆ M rt,n ( y ) depends on the asymptoticdistribution of y (cid:55)→ ˆ M rt +1 ,n ( y ). By the same reasoning, the asymptotic distribution of y (cid:55)→ ˆ M rt +1 ,n ( y )depends on the asymptotic distribution of y (cid:55)→ ˆ M rt +2 ,n ( y ), and so on. By repeating this argument, weobtain an expansion of √ n (cid:16) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:17) which explicitly accounts for the error propagationfrom future periods. To simplify the notation, we use two equivalent forms of the indicator function I ( x ∈ E ) ≡ I E ( x ). For each pair ( t, τ ) with 1 ≤ t < τ ≤ T , we deﬁne the following functions: φ t,τ,y ( d t , · · · , d τ ) (cid:44) I ( y − d t − · · · − d τ − ≥ s τ ) × · · · × I ( y − d t − d t +1 ≥ s t +2 ) × I ( y − d t ≥ s t +1 ) × (cid:16) { M (cid:48) τ +1 I [ s τ +1 , ∞ ) } ( y − d t − · · · − d τ ) − E { M (cid:48) τ +1 I [ s τ +1 , ∞ ) } ( y − d t − · · · − d τ − − D τ ) (cid:17) ,ψ t,τ,y ( d t , · · · , d τ ) (cid:44) I ( y − d t − · · · − d τ − ≥ s τ ) × · · · × I ( y − d t − d t +1 ≥ s t +2 ) × I ( y − d t ≥ s t +1 ) × ( b τ + h τ ) (cid:16) I ( y − d t − · · · − d τ ≥ − F τ ( y − d t − · · · − d τ − ) (cid:17) , (7.1)indexed by y ∈ Θ. When there are no setup costs, we have s t = S t . However, we include s t in theabove deﬁnition because these functions will appear again in the next subsection to analyze theproblem with setup costs. By default, M (cid:48) T +1 ( y ) = 0 and so we set φ t,T,y = 0. It is easy to check that E φ t,τ,y ( D τ , · · · , D t ) = 0 and E ψ t,τ,y ( D t , · · · , D τ ) = 0 for all τ and t , both with and without setupcosts. Using the shorthand in (7.1), the following lemma gives the asymptotic representation of √ n (cid:0) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:1) . Lemma 10.

Suppose Assumptions 1 - 3 hold. Fix t ∈ T and suppose that √ n ( (cid:98) S τ − S τ ) = O P (1) forall t + 1 ≤ τ ≤ T . Then √ n (cid:16) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:17) = T (cid:88) τ = t +1 n τ − t +1 / (cid:88) i t , ··· ,i τ (cid:8) ψ t,τ,y ( d i t t , · · · , d i τ τ ) + φ t,τ,y ( d i t t , · · · , d i τ τ ) (cid:9) + R n ( y ) , where sup y ∈ Θ | R n ( y ) | = o P (1) . It follows that √ n (cid:16) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:17) is a summation of (2 T − t ) U -processes, including two( τ − t + 1)-sample U -processes for each t + 1 ≤ τ ≤ T . For ﬁxed y ∈ Θ, the convergence of theabove multi-sample U -statistics is well-studied (van der Vaart 1998). However, it is challengingto demonstrate convergence of these multi-sample U -processes as the index y varies. Thanks tothe special structure of ψ t,τ,y and φ t,τ,y , we can use induction to approximate each of the ( τ − un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript t + 1)-sample U -processes by an i.i.d. sum plus a remainder term that is uniformly negligible, assummarized in Lemma EC.5. This transformation enables application of standard empirical processtheory for subsequent analysis. For the next lemma, we deﬁne the following function classes: H t,τ (cid:44) { ψ t,τ,y ( d t , d t +1 , · · · , d τ ) : y ∈ Θ } , (7.2)˜ H t,τ (cid:44) { φ t,τ,y ( d t , d t +1 , · · · , d τ ) : y ∈ Θ } . (7.3) Lemma 11.

Suppose Assumptions 1 - 3 hold. Fix t ∈ T and suppose that √ n ( (cid:98) S τ − S τ ) = O P (1) forall t + 1 ≤ τ ≤ T . Then, for any t + 1 ≤ τ ≤ T , both H t,τ and ˜ H t,τ are uniformly bounded Euclideanclasses. In addition, n τ − t +1 / (cid:88) i t , ··· ,i τ φ t,τ,y ( d i t t , d i t +1 t +1 , · · · , d i τ τ ) = 1 √ n n (cid:88) i =1 E φ t,τ,y ( D t , · · · , D τ − , d i τ τ ) + R n ( y ) , (7.4)1 n τ − t +1 / (cid:88) i t , ··· ,i τ ψ t,τ,y ( d i t t , d i t +1 t +1 , · · · , d i τ τ ) = 1 √ n n (cid:88) i =1 E ψ t,τ,y ( D t , · · · , D τ − , d i τ τ ) + R n ( y ) , (7.5) where both remainder terms satisfy sup y ∈ Θ | R n ( y ) | = o P (1) . As a consequence of Lemma 11, we have √ n (cid:16) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:17) = 1 √ n n (cid:88) i =1 T (cid:88) τ = t +1 E (cid:8) ψ t,τ,y ( D t , · · · , D τ − , d iτ ) + φ t,τ,y ( D t , · · · , D τ − , d iτ ) (cid:9) + R n ( y ) , where sup y ∈ Θ | R n ( y ) | = o P (1) . The ﬁrst term on the RHS above is an empirical process. Similarto Lemmas 6 and 8, we can establish the asymptotic continuity of this process around y = S t through routine calculation. This asymptotic continuity, together with Lemmas 2 and 11, gives theasymptotic expansion of (cid:98) S as summarized below, where d (cid:55)→ m rt,S t ( d ) is deﬁned in (5.2). Theorem 8.

Suppose Assumptions 1 – 3 hold. Fix t ∈ T and suppose that M t is twice diﬀerentiableat S t and that √ n ( (cid:98) S τ − S τ ) = O P (1) for all t + 1 ≤ τ ≤ T . Then √ n M (cid:48)(cid:48) t ( S t ) (cid:16) (cid:98) S t − S t (cid:17) = − √ n n (cid:88) i =1 T (cid:88) τ = t +1 E (cid:8) ψ t,τ,S t ( D t , · · · , D τ − , d iτ ) + φ τ,t,S t ( D t , · · · , D τ − , d iτ ) (cid:9) − √ n n (cid:88) i =1 m rt,S t ( d it ) + o P (cid:16) √ n ( (cid:98) S t − S t ) (cid:17) . With Theorem 8 in hand, we can now establish joint asymptotic normality of { (cid:98) S t } t ∈T . To this end,deﬁne the following inﬂuence functions: ϕ ST ( d , d , · · · , d T ) (cid:44) − M (cid:48)(cid:48) T ( S T ) (cid:16) ( b T + h T ) I ( S T − d T ≥ − b T (cid:17) , (7.6) ϕ ST − ( d , d , · · · , d T ) (cid:44) − M (cid:48)(cid:48) T − ( S T − ) (cid:16) m rT − ,S T − ( d T − ) + E ψ T − ,T,S T − ( D T − , d T ) (cid:17) , (7.7) un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript and ϕ St ( d , d , · · · , d T ) (cid:44) − M (cid:48)(cid:48) t ( S t ) T − (cid:88) τ = t +1 E (cid:8) ψ t,τ,S t + φ t,τ,S t } ( D t , · · · , , D τ − , d τ ) − M (cid:48)(cid:48) t ( S t ) (cid:16) m rt,S t ( d t ) + E ψ t,T,S t ( D t , · · · , , D T − , d T ) (cid:17) , for t = 1 , , · · · , T − m rt,S t ( d ) is deﬁned in (5.2) for all τ ∈ T ). Our main result on theasymptotic joint normality of { (cid:98) S t } t ∈T for the base-stock policy without setup costs is next. Theorem 9.

Suppose Assumptions 1 – 3 hold, and that M (cid:48)(cid:48) t ( S t ) exists and is nonzero for all t ∈ T .Then √ n ( (cid:98) S t − S t ) = O P (1) for all t ∈ T , and (cid:16) √ n ( (cid:98) S − S ) , √ n ( (cid:98) S − S ) , · · · , √ n ( (cid:98) S T − S T ) (cid:17) (cid:32) N T (0 , Σ T ) , where the RHS is a T -dimensional normal distribution with mean zero and covariance matrix Σ T ∈ R T × T given entry-wise by Σ T ( i, j ) = cov (cid:16) ϕ Si ( D , D , · · · , D T ) , ϕ Sj ( D , D , · · · , D T ) (cid:17) , i, j ∈ { , , · · · , T } . Theorem 9 has practical implications as a tool for hypothesis testing. For example, we can testthe null hypothesis that S ≤ S ≤ · · · ≤ S T . If this null hypothesis is not rejected, then the optimalpolicy is myopic (Ignall and Veinott 1969) and substantially easier to compute. We now consider the general case of problem (2.2) with setup costs for T ≥

3, where the data-drivenpolicy is determined by the estimated parameters { ( (cid:98) s t , (cid:98) S t ) } t ∈T deﬁned by (2.9)-(2.10). Lemmas 2and 3 serve as the starting point for establishing the asymptotic representations of (cid:98) S t and (cid:98) s t in thegeneral case. Mirroring the insight from the two-period problem, our analysis here will depend onthe behavior of the functions ˆ M rt,n ( (cid:98) S t ), ˆ M t,n ( (cid:98) s t ), and ˆ M t,n ( (cid:98) S t ). It is hard to derive the asymptoticinﬂuence functions for (cid:98) s t and (cid:98) S t in closed form when there are setup costs, but we can derive theseinﬂuence functions in recursive form.We ﬁrst consider √ n ( (cid:98) S t − S t ). Analysis of its asymptotics is similar to the case without setupcosts. We leverage Lemma 2 and examine the expression √ n (cid:0) ˆ M rt,n ( (cid:98) S t ) − M rt,n ( (cid:98) S t ) (cid:1) . Compared withthe previous section, the setup costs result in some additional terms due to the nonzero derivative M (cid:48) t ( s t ). To capture these terms, for all 1 ≤ t ≤ τ ≤ T we deﬁne χ t,τ,y,x ( d t , · · · , d τ ) (cid:44) I ( y − d t − · · · − d τ ≥ x ) × τ − (cid:89) ι = t I ( y − d t − d t +1 − · · · − d ι ≥ s ι +1 ) . (7.8)The second term on the RHS above is deﬁned to be equal to one when t = τ . un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript

The following lemma gives the asymptotic representation of √ n (cid:0) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:1) in terms ofthe above functions. The idea of the proof is similar to Lemma 10, except that now we have anadditional term resulting from the nonzero derivative M (cid:48) t ( s t ). In the previous subsection withoutsetup costs, this term just is zero because s t = S t and the optimality condition for S t is M (cid:48) t ( S t ) = 0. Lemma 12.

Suppose Assumptions 1 and 2 hold. Fix t ∈ T and suppose that √ n ( (cid:98) s τ +1 − s τ +1 ) = O P (1) for all t ≤ τ ≤ T − . Then √ n (cid:16) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:17) = T − (cid:88) τ = t (cid:88) i t , ··· ,i τ M (cid:48) τ +1 ( s τ +1 ) n τ − t +1 / (cid:8) χ t,τ,y, (cid:98) s τ +1 ( d i t t , · · · , d i τ τ ) − χ t,τ,y,s τ +1 ( d i t t , · · · , d i τ τ ) (cid:9) + T (cid:88) τ = t +1 n τ − t +1 / (cid:88) i t , ··· ,i τ (cid:8) ψ t,τ,y ( d i t t , · · · , d i τ τ ) + φ t,τ,y ( d i t t , · · · , d i τ τ ) (cid:9) + R n ( y ) , (7.9) where sup y ∈ Θ | R n ( y ) | = o P (1) . Lemma 12 allows us to obtain the asymptotic representation of √ n (cid:0) ˆ M rt,n ( (cid:98) S t ) − M rt,n ( (cid:98) S t ) (cid:1) byreplacing y in the above display with (cid:98) S t . Afterwards, Lemma 11 can be applied (it is straightforwardto verify that Lemma 11 still holds with setup costs) to the second term on the RHS of the abovedisplay and combined with an asymptotic continuity argument.We now examine the ﬁrst term on the RHS of (7.9), which is a summation of the diﬀer-ences of multi-sample U -processes. Unlike φ t,τ,y ( D t , · · · , D τ ) or ψ t,τ,y ( D t , · · · , D τ ), the mean of χ t,τ,y,x ( D t , · · · , D τ ) is generally nonzero (and positive). Deﬁne the map ( x, y ) (cid:55)→ ˜ χ t,τ ( x, y ) where˜ χ t,τ ( x, y ) (cid:44) M (cid:48) τ +1 ( s τ +1 ) E χ t,τ,y,x ( D t , · · · , D τ ) , τ = t, · · · , T − , (7.10)and denote the partial derivative ∂∂y ˜ χ t,τ ( x, y ) by ˜˙ χ t,τ ( x, y ). The lemma below decomposes each ofthese multi-sample U -processes into a summation of empirical processes. Lemma 13.

Suppose Assumptions 1 and 2 hold. Then, for all ( t, τ ) satisfying t ≤ τ ≤ T − , n τ − t +1 / (cid:88) i t , ··· ,i τ χ t,τ,y,x ( d i t t , · · · , d i τ τ ) = 1 √ n n (cid:88) i =1 τ (cid:88) ι = t (cid:8) E χ t,τ,y,x ( D t , · · · , D ι − , d iι , D ι +1 , · · · , D τ ) − ˜ χ t,τ ( x, y ) (cid:9) + √ n ˜ χ t,τ ( x, y ) / M (cid:48) τ +1 ( s τ +1 ) + R n ( x, y ) , where sup x,y | R n ( x, y ) | = o P (1) . Further, if √ n ( (cid:98) s τ +1 − s τ +1 ) = O P (1) for all t ≤ τ ≤ T − , then T − (cid:88) τ = t (cid:88) i t , ··· ,i τ M (cid:48) τ +1 ( s τ +1 ) n τ − t +1 / (cid:110) χ t,τ, (cid:98) S t , (cid:98) s τ +1 ( d i t t , · · · , d i τ τ ) − χ t,τ, (cid:98) S t ,s τ +1 ( d i t t , · · · , d i τ τ ) (cid:111) = T − (cid:88) τ = t ˜˙ χ t,τ ( S t , s τ +1 ) · √ n ( (cid:98) s τ +1 − s τ +1 ) + o P (cid:16) √ n ( (cid:98) S t − S t ) (cid:17) . (7.11) un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript Lemma 13 shows that the estimation error in (cid:98) s τ for all τ = t + 1 , · · · , T will aﬀect the asymptoticdistribution of (cid:98) S t . By combining it with Lemmas 2 and 11, we get the asymptotic representationof √ n (cid:0) (cid:98) S t − S t (cid:1) , which is similar to the one in Theorem 8, but without the additional term from(7.11). As a result, the inﬂuence function of (cid:98) S t will depend on the inﬂuence functions of (cid:98) s τ for t + 1 ≤ τ ≤ T . We shall ﬁrst derive the asymptotic representations of (cid:98) s τ for t + 1 ≤ τ ≤ T , and thenreturn to write the inﬂuence functions of (cid:98) S t recursively. According to Lemma 3, the asymptotics of (cid:98) s t depend on √ n (cid:0) ˆ M t,n ( y ) − M t,n ( y ) (cid:1) . The following lemma extends Lemma 7 from T = 2 to T ≥ Lemma 14.

Suppose Assumptions 1 and 2 hold. Fix t ≤ T − and suppose that (cid:98) s t +1 − s t +1 = O P ( n − / ) , then √ n (cid:16) ˆ M t,n ( y ) − M t,n ( y ) (cid:17) = { − F t ( y − s t +1 ) }√ n (cid:16) ˆ M t +1 ,n ( (cid:98) s t +1 ) − M t +1 ( s t +1 ) (cid:17) +1 √ n n (cid:88) i =1 (cid:16) ˆ M t +1 ,n ( y − d it ) − M t +1 ( y − d it ) (cid:17) I ( y − d it ≥ s t +1 ) + R n ( y ) , where sup y ∈ Θ | R n ( y ) | = o P (1) . As in the analysis of (cid:98) S t , we can repeatedly use Lemma 14 to express √ n (cid:16) ˆ M t,n ( y ) − M t,n ( y ) (cid:17) asan i.i.d. sum plus a remainder term. For a pair ( t, τ ) satisfying 1 ≤ t ≤ τ ≤ T , deﬁneΨ t,τ,y ( d t , · · · , d τ ) (cid:44) (cid:110) M τ +1 ( y − d t − d t +1 − · · · − d τ ) × I ( y − d t − d t +1 − · · · − d τ ≥ s τ +1 ) + M τ +1 ( s τ +1 ) × I ( y − d t − · · · − d τ < s τ +1 ) + b τ ( d τ + · · · + d t − y ) + + h τ ( y − d t − · · · − d τ ) + (cid:111) × τ − (cid:89) ι = t I ( y − d t − d t +1 − · · · − d ι ≥ s ι +1 ) , (7.12)where we stipulate that M T +1 = 0, and that the product in the last line is deﬁned to be one if τ = t . The following lemma gives the ﬁnal asymptotic representation of ˆ M t,n ( y ), where we recallthat d (cid:55)→ m t,s t ( d ) is deﬁned in (4.1). Lemma 15.

Suppose Assumptions 1 and 2 hold. Fix t ∈ T and suppose that (cid:16) ˆ M τ +1 ,n ( (cid:98) s τ +1 ) − M τ +1 ( s τ +1 ) (cid:17) = O P ( n − / ) and (cid:98) s τ +1 − s τ +1 = O P ( n − / ) for all t ≤ τ ≤ T − . Then √ n (cid:16) ˆ M t,n ( y ) − M t,n ( y ) (cid:17) = 1 √ n n (cid:88) i =1 T (cid:88) τ = t +1 (cid:0) E Ψ t,τ,y ( D t , · · · , D τ − , d iτ ) − E Ψ t,τ,y ( D t , · · · , D τ ) (cid:1) + T − (cid:88) τ = t (cid:104) √ n (cid:16) ˆ M τ +1 ,n ( (cid:98) s τ +1 ) − M τ +1 ,n ( (cid:98) s τ +1 ) (cid:17) + G n m τ +1 ,s τ +1 + √ n M (cid:48) τ +1 ( s τ +1 )( (cid:98) s τ +1 − s τ +1 ) (cid:105) × E (cid:20) I ( y − D t − · · · − D τ < s τ +1 ) τ − (cid:89) ι = t I ( y − D t − · · · − D ι ≥ s ι +1 ) (cid:21) + R n ( y ) , (7.13) where sup y ∈ Θ | R n ( y ) | = o P (1) . un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript

If the “inﬂuence functions” of √ n ( (cid:98) s τ +1 − s τ +1 ), and √ n (cid:0) ˆ M τ +1 ,n ( (cid:98) s τ +1 ) − M τ +1 ,n ( (cid:98) s τ +1 ) (cid:1) for τ = t, t + 1 , · · · , T − √ n (cid:16) ˆ M t,n ( (cid:98) s t ) − M t,n ( (cid:98) s t ) (cid:17) and √ n (cid:16) ˆ M t,n ( (cid:98) S t ) − M t,n ( (cid:98) S t ) (cid:17) . The asymptotic representation of √ n ( (cid:98) s t − s t ) is then immediate fromLemma 3. Finally, asymptotic normality of √ n ( (cid:98) s τ − s τ ) for all τ = t + 1 , t + 2 , · · · , T also impliesasymptotic normality of √ n ( (cid:98) S t − S t ) through Lemma 12 and Lemma 13.This line of reasoning summarizes our induction argument, which is encapsulated in the followingtheorem. We note that the inﬂuence functions of (cid:98) S t and (cid:98) s t need to be written recursively becausethe induction hypothesis includes √ n (cid:0) ˆ M t,n ( (cid:98) S t ) − M t,n ( (cid:98) S t ) (cid:1) and √ n (cid:0) ˆ M t,n ( (cid:98) s t ) − M t,n ( (cid:98) s t ) (cid:1) . Theorem 10.

Suppose Assumptions 1 and 2 hold, and that M (cid:48)(cid:48) t ( S t ) and M (cid:48) t ( s t ) exist and arenonzero for all t ∈ T . Then for all t ∈ T , there exist functions κ t,s t , κ t,S t , ϕ st , ϕ St : R T − t +1 (cid:55)→ R suchthat √ n (cid:16) ˆ M t,n ( (cid:98) s t ) − M t,n ( (cid:98) s t ) (cid:17) = G n κ t,s t + o P (1) , √ n (cid:16) ˆ M t,n ( (cid:98) S t ) − M t,n ( (cid:98) S t ) (cid:17) = G n κ t,S t + o P (1) , √ n ( (cid:98) s t − s t ) = G n ϕ st + o P (1) , √ n ( (cid:98) S t − S t ) = G n ϕ St + o P (1) . The recursive expressions for κ t,s t , κ t,S t , ϕ st , and ϕ St are given in (EC.2.53) - (EC.2.55) . Further-more, (cid:16) √ n (cid:0)(cid:98) s − s (cid:1) , √ n (cid:0) (cid:98) S − S (cid:1) , · · · , √ n (cid:0)(cid:98) s T − s T (cid:1) , √ n (cid:0) (cid:98) S T − S T (cid:1)(cid:17) (cid:32) N T (0 , Σ T ) , where Σ T ( i, j ) is the covariance of the i th and j th inﬂuence functions in { ϕ s , ϕ S , · · · , ϕ sT , ϕ ST } . A natural estimator for the optimal total expected cost C ∗ = M t ( S t ) + K is ˆ C ∗ = ˆ M t,n ( (cid:98) S t ) + K t .By combining the above theorem and Lemma 3, it is readily seen that this estimator is asymptot-ically normal.

8. Numerical experiments

Our results on the asymptotic distribution of the data-driven policy can be used to construct two-sided equal-tailed conﬁdence intervals using the large-sample normal approximation. This sectionreports comprehensive simulations for the two-period problem to validate the accuracy of theresulting conﬁdence intervals for the data-driven policy and the expected optimal cost.Following Ban (2020), we consider truncated normal demand distributions D ∼ N (20 , ) ∩ [0 ,

40] and D ∼ N (10 , ) ∩ [0 , b = 1 . b = 20, the unit un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript holding costs are h = 0 . h = 20, and the setup costs are K = K = 2. We consider severaldiﬀerent sample sizes n ∈ { , , , , , , } to get a sense of how the coverage accuracyimproves with increasing n . To ensure small Monte Carlo simulation error, we do 10,000 replicationsfor each sample size. That is, for each sample size n , we generate a sample { d i , d i } ni =1 , constructconﬁdence intervals based on this sample, and then we repeat the procedure 10,000 times tocompute the empirical coverage probabilities of the conﬁdence intervals. We also do hypothesistests by converting the testing problem into an interval estimation problem, and then we evaluatethe Type-I error rates and power of the tests. We ﬁrst evaluate the performance of the 95% conﬁdence intervals for ( s , S , s , S ), in terms of theempirical coverage probabilities. Our conﬁdence intervals are constructed based on the asymptoticnormality results in Theorem 6 (including the consistent covariance estimator given in (EC.3.8)and (EC.3.14)). For comparison, we also construct conﬁdence intervals for ( s , S , s , S ) based onTheorem 4 in Ban (2020), which is referred to as the “existing method” in the following discussion.These results are shown in Figure 1. Additionally, we give the empirical coverage probabilitiesof the 90% and the 95% conﬁdence intervals for the optimal expected cost for sample sizes n ∈{ , , , , } based on Theorem 7, as shown in Figure 2.It is apparent from Figure 1 that the empirical coverage probabilities of our conﬁdence intervalsfor ( s , S , s , S ) tend to converge to the nominal value (95%) as the sample size n increases. Asexplained in Section 5.2, the error propagation for the ﬁnal period is negligible, so the two curvesrepresenting the coverage probabilities of S and s in the ﬁrst two plots of Figure 1 coincide.On the other hand, the empirical coverage probabilities of the existing method for s and S are systematically smaller than the nominal value. This is because the existing method overlooksthe error propagation (which is not negligible), resulting in underestimation of the asymptoticvariance. Speciﬁcally, the asymptotic variances of (cid:98) s and (cid:98) S based on Theorem 6 are 86.7 and 116.9,respectively. In contrast, the corresponding asymptotic variances computed by the existing methodare 54.6 and 75.8. This observation suggests that the error propagation accounts for a signiﬁcantportion of the asymptotic variance of (cid:98) s (approximately 37%) and the asymptotic variance of (cid:98) S (approximately 35%).We also consider conﬁdence intervals for the optimal expected cost. Figure 2 shows that theempirical coverage probabilities of both the 90% and the 95% data-driven conﬁdence intervals forthe optimal expected cost quickly converge to the nominal values as the sample size increases. Thecoverage probability is generally satisfactory when n ≥

30. In contrast, the marginal analysis inBan (2020) does not lend itself to interval estimation of this important parameter. un Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies Manuscript . . . . . . s C o v e r age p r obab ili t y Proposed methodExisting methodn . . . . . . S C o v e r age p r obab ili t y Proposed methodExisting methodn . . . . . . s C o v e r age p r obab ili t y Proposed methodExisting methodn . . . . . . S C o v e r age p r obab ili t y Proposed methodExisting methodn

Figure 1

Empirical coverage probabilities of the 95% conﬁdence intervals for the optimal order-up-to levels andthe reorder points using our results and the existing method.

We can construct a prediction interval for the relative error of the estimated expected optimalcost, denoted ε RE ( ˆ C ∗ , C ∗ ) (cid:44) ( ˆ C ∗ − C ∗ ) /C ∗ . The relative error is much more meaningful compared tothe absolute error from an operational standpoint (i.e., the absolute error is really only useful whenwe also know the true optimal expected cost). Based on Theorem 7, we know that √ nε RE ( ˆ C ∗ , C ∗ ) (cid:32) N (0 , σ C / ( C ∗ ) ), where σ C is the asymptotic variance of ˆ C ∗ which can be consistently estimated byˆ σ C , as given in (EC.3.19). With a simple continuous mapping argument, it is readily seen that P (cid:8)(cid:12)(cid:12) ε RE ( ˆ C ∗ , C ∗ ) (cid:12)(cid:12) ≤ z α/ ˆ σ C / ( √ n ˆ C ∗ ) (cid:9) → − α, (8.1)where z α is the α -upper quantile of the standard normal distribution. In other words, z α/ ˆ σ C / ( √ n ˆ C ∗ ) is a 1 − α upper bound on the prediction interval for | ε RE ( ˆ C ∗ , C ∗ ) | . un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript . . . . . C o v e r age p r obab ili t y α = α = . . . . . Figure 2

Empirical coverage probabilities of the 90% and 95% conﬁdence intervals for the optimal expected cost.

15 30 50 100 150 R e l a t i v e e rr o r − . − . . . . n . . . . . . % quan t il e Asymptotic quantileTrue quantilen

Figure 3

The relative errors ε RE ( ˆ C ∗ , C ∗ ) (left panel), and the 95% quantiles of (cid:12)(cid:12) ε RE ( ˆ C ∗ , C ∗ ) (cid:12)(cid:12) (right panel):The asymptotic 95% quantile is an average of the 10,000 realizations of z . ˆ σ C / ( √ n ˆ C ∗ ), and the truequantile is the empirical 95% quantile of the 10,000 realizations of (cid:12)(cid:12) ε RE ( ˆ C ∗ , C ∗ ) (cid:12)(cid:12) . The ﬁrst plot of Figure 3 presents the box plots for ε RE ( ˆ C ∗ , C ∗ ) for each n . We can see thatthe 0 . . . n increases. For n = 150, even the upper and lower whiskers fall within ± z . ˆ σ C / ( √ n ˆ C ∗ ) as well as the un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript true 0 . (cid:12)(cid:12) ε RE ( ˆ C ∗ , C ∗ ) (cid:12)(cid:12) as estimated by the 0 . C ∗ , C ∗ ). The two curves almost overlap, which implies that our asymptotic bound is accurate.This ﬁgure shows that the relative error is about 10% when n = 100. For comparison, the analysisof Levi et al. (2007) and Levi et al. (2015) requires sample sizes greater than n = 20 ,

000 to doestimation of similar quality.

Our asymptotic results allow us to conduct hypothesis tests on the optimal expected cost. Beforeimplementing the data-driven policy, the retailer normally has a target expected cost C ∗ in mind.The retailer will only actually go into business if it is conﬁdent that the true expected cost of thedata-driven policy is not higher than C ∗ . This question can be formulated as the hypothesis test: H : C ∗ > C ∗ , H : C ∗ ≤ C ∗ . Under the null hypothesis, the retailer should not go into business because it is not proﬁtableenough. If the information from the data is strong enough to reject the null hypothesis, then theretailer can have high conﬁdence that the alternative hypothesis H is true (i.e., that the inventorysystem is suﬃciently proﬁtable). Testing the above one-sided hypothesis can be done with the helpof Theorem 7. We know from Theorem 7 that √ n { ˆ C ∗ − C ∗ − ( C ∗ − C ∗ ) } /σ C (cid:32) N (0 , C ∗ − C ∗ >

0) we can use the continuous mapping theorem to seethat P (cid:32) √ n ( ˆ C ∗ − C ∗ )ˆ σ C ≤ − z α (cid:33) < P (cid:32) √ n ( ˆ C ∗ − C ∗ ) − √ n ( C ∗ − C ∗ )ˆ σ C ≤ − z α (cid:33) → α. Therefore, the test with rejection region ( −∞ , C ∗ − ˆ σ C z α / √ n ] is asymptotically of level α .For the parameter settings used in our experiments, the optimal expected cost is C ∗ =103 .

3. We ﬁx α = 0 .

05 and examine the Type-I error as well as the test power for C ∗ ∈{ . C ∗ , . C ∗ , . C ∗ , . C ∗ } . The ﬁrst two values correspond to the case where the true opti-mal expected cost is larger than the target, and a small Type-I error implies good performance.The last two values correspond to the case where the true optimal expected cost is smaller than thetrue target, and a high power is desired. Figure 4 gives the Type-I error curves and power curvesas a function of the sample size n . The left panel shows that the Type-I error rates are well below0.05 and vanish quickly. This is expected because both values of C ∗ are smaller than C ∗ . The rightpanel shows that the power is good enough even when C ∗ is only slightly larger than C ∗ .Next, we do hypothesis testing for the optimality of a myopic policy. According to Ignall andVeinott (1969) and under Assumption 3 (no setup costs), nondecreasing optimal order-up-to levels un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript T y pe I e rr o r C ∗ =0.8 × C ∗ C ∗ =0.9 × C ∗ n . . . . . . . . . . . P o w e r C ∗ =1.15 × C ∗ C ∗ =1.35 × C ∗ n Figure 4

Type-I error curves for C ∗ = 0 . × C ∗ and C ∗ = 0 . × C ∗ (left panel), and power curves for C ∗ = 1 . × C ∗ and C ∗ = 1 . × C ∗ (right panel). S ≤ S ≤ · · · ≤ S T are a suﬃcient condition for optimality of a myopic policy. We set K = K = 0for this part, and keep all other parameter settings the same. For computational considerations,this question can be formulated as a one-sided hypothesis test: H : S ≤ S , H : S > S . We prefer to use a myopic policy unless there is strong evidence that S > S (in which case amyopic policy is not optimal).We need the following result on the asymptotics of (cid:98) S − (cid:98) S , which is a direct consequence ofTheorem 9. Recall that the inﬂuence functions ϕ S and ϕ S for S and S in the base-stock policyare deﬁned in (7.6)-(7.7). Corollary 1.

Consider problem (2.2) for T = 2 . Suppose Assumptions 1 and 2 hold, and that M (cid:48)(cid:48) t ( S t ) exists and is nonzero for t = 1 , . Then, √ n (cid:8) (cid:98) S − (cid:98) S − ( S − S ) (cid:9) is asymptotically normalwith inﬂuence function ϕ S − ϕ S . Under mild conditions, the variance can be consistently estimatedby (EC.3.25) . The asymptotic variance of (cid:98) S − (cid:98) S , which we denote as σ S − S , can be consistently estimatedby ˆ σ S − S , as given in (EC.3.25). Thus, under the null hypothesis that S − S ≤

0, we can useCorollary 1 and the continuous mapping theorem to see P (cid:32) √ n ( (cid:98) S − (cid:98) S )ˆ σ S − S ≥ z α (cid:33) ≤ P (cid:32) √ n ( (cid:98) S − (cid:98) S ) − √ n ( S − S )ˆ σ S − S ≥ z α (cid:33) → α. un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies Manuscript . . . . . . P o w e r α = α = Figure 5

The power curves for testing the myopic policy under α = 0 .

05 and α = 0 . As a result, the test with rejection region [ z α ˆ σ S − S / √ n, + ∞ ) is asymptotically of level α .For our parameter settings, we have S = 14 .

67 and S = 10. Thus the alternative hypothesisis true and a high power implies good performance of the test. The power curves of the test forsigniﬁcance levels α = 0 .

05 and α = 0 .

10 are shown in Figure 5. For both α = 0 .

05 and α = 0 .

9. Conclusion

Solving the empirical cost-to-go functions to compute the data-driven ( s, S )-policy is a simple andpractical way to solve real inventory problems. In this work, we have established consistency andasymptotic normality of this canonical data-driven policy for the general multi-period inventorycontrol problem. The main challenge in this analysis comes from the backwards error propagationof (cid:98) s t +1 and ˆ M t +1 ,n into the estimation of S t and s t . From a high-level viewpoint, (cid:98) s t +1 and ˆ M t +1 ,n can be regarded as estimated nuisance parameters when we derive the inﬂuence functions of (cid:98) S t and (cid:98) s t . There is some existing work on the problem of how to handle nuisance parameters, most of itis in the literature on semiparametric inference, e.g. (Kosorok 2007, Theorem 21.1). Nevertheless,the existing work requires a convergence rate of the estimated nuisance parameters. It is hard toestablish a convergence rate in our inventory problem because: (i) ˆ M t +1 ,n is inﬁnite-dimensional;and (ii) (cid:98) s t +1 appears in the argument of ˆ M t +1 ,n (and this leads to the hard problem of bundledparameters in statistical inference, see e.g. Zhao et al. (2017)). Since these results require conditionsthat are hard to verify, we take a diﬀerent approach and explicitly express the estimated nuisanceparameters as U -statistics. Then, we develop a multi-sample U -process theory to capture their un Zhang, Zhisheng, Ye, and William Haskell: Data-driven inventory policies

Manuscript contribution to the inﬂuence functions of (cid:98) S t and (cid:98) s t . In this way, our methodological innovationsheds light on the larger problem of handling nuisance parameters.From the standpoint of operations management, our work is the ﬁrst to derive the joint asymp-totic distribution of a data-driven policy, while taking into account the transmission of estimationerror from future time periods. The problem of backwards error propagation is ubiquitous in data-driven stochastic DP. Essentially, any data-driven DP method will have to deal with this issue.As we have shown for the multi-period inventory problem, failing to account for this backwardserror propagation will lead to incorrect conclusions about the statistical properties of data-drivenpolicies (in our experiments we saw that it leads to underestimation of the asymptotic varianceof the data-driven policy). The techniques we have developed in this work are general and widelyrelevant to other stochastic DP problems for exactly this reason. In future work, we will extendour methods to more general classes of data-driven DP problems. References

Ban GY (2020) Conﬁdence intervals for data-driven inventory policies with demand censoring.

OperationsResearch

Man-agement Science

Manufacturing & Service Operations Management

Management Science

Operations Research

Retail Supply Chain Management:Quantitative Models and Empirical Studies , 79–112 (Springer).Ding X, Puterman ML, Bisi A (2002) The censored newsvendor and the optimal acquisition of information.

Operations Research

Management Science un Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies Manuscript

Goldberg DA, Katz-Rogozhnikov DA, Lu Y, Sharma M, Squillante MS (2016) Asymptotic optimality ofconstant-order policies for lost sales inventory models with large lead times.

Mathematics of OperationsResearch

Mathematics ofOperations Research

Management Science

Operations Research

Mathematics of Operations Research

Man-agement Science

Introduction to Empirical Processes and Semiparametric Inference (Springer Science &Business Media).Lariviere MA, Porteus EL (1999) Stalking information: Bayesian inventory management with unobservedlost sales.

Management Science

Operations Research

Mathematics of Operations Research

Manage-ment Science u -processes. Statistics Probability Letters u -processes: Rates of convergence. The Annals of Statistics u -processes. The Annals of Probability

NSF-CBMS Regional Conference Series inProbability and Statistics un Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

Manuscript

Mathematics of Operations Research

SSRN Electronic Journal s, s ) policies in the dynamic inventory problem. Arrow KJ, Karlin S,Suppes P, eds.,

Mathematical Method in the Social Sciences , chapter 13 (Stanford University Press).Sethi SP, Cheng F (1997) Optimality of ( s , s ) policies in inventory models with markovian demand. Opera-tions Research

The Annals of Statistics

Fundamentals of Supply Chain Theory (John Wiley, Sons, Ltd).van der Vaart AW (1998)

Asymptotic Statistics (Cambridge University Press).van der Vaart AW, Wellner JA (1996)

Weak Convergence and Empirical Processes: With Applications toStatistics (Springer).Zhang H, Chao X, Shi C (2018) Perishable inventory systems: Convexity results for base-stock policies andlearning algorithms under censored demand.

Operations Research

Bernoulli

Foundations of Inventory Management (McGraw-Hill). -companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC1

Electronic Companion to “Re-assesing Conﬁdence intervalsfor data-driven inventory policies”

This e-companion consists of three parts. Section EC.1 collects the statements of some technicallemmas that will be used repeatedly, all of the proofs of these lemmas are deferred to SectionEC.3. Section EC.2 contains the proofs of all the results from the main body of the paper. Wehave assumed that for every t ∈ T , s t , S t ∈ Θ and D t has bounded support [ d t , d t ]. For notationalconvenience in this e-companion, we enlarge Θ to be the convex closure of the union of the originalΘ and all [ d t , d t ]. This Θ contains the support of D t and the parameter space of ( s t , S t ), for all t ∈ T . It is slightly larger than the original spaces, but this does not aﬀect the theoretical analysis. EC.1. Technical Lemmas

EC.1.1. P -Donsker class We single out two P -Donsker classes of functions that play an important role. First recall thefunction classes M t = { d (cid:55)→ m t,y ( d ) : y ∈ Θ } in (4.1) where m T,y ( d ) (cid:44) b T ( d − y ) + + h T ( y − d ) + and m t,y ( d ) (cid:44) b t ( d − y ) + + h t ( y − d ) + + M t +1 ( s t +1 ) I ( y − d < s t +1 ) + M t +1 ( y − d ) I ( y − d ≥ s t +1 ) , for t ∈ T . This class of functions is P -Donsker, as summarized in the following lemma. Lemma EC.1. (See Section EC.3.1 for the proof ) Suppose Assumptions 1 and 2 hold, then for all t ∈ T : (i) There exists L t such that | m t,y ( d ) − m t,y ( d ) | ≤ L t | y − y | for all d, y , y ∈ Θ . (ii) The class M t is P -Donsker. Recall the right derivative m rt,y of m t,y , deﬁned in (5.2). For t = T it is m rT,y ( d ) = ( b T + h T ) I ( d ≤ y ) − b T ; and for t ∈ T it is m rt,y ( d ) = ( b t + h t ) I ( d ≤ y ) − b t + M (cid:48) t +1 ( y − d ) I ( y − d ≥ s t +1 ) . For all t ∈ T ,we deﬁne the function classes: M rt (cid:44) { d (cid:55)→ m rt,y ( d ) : y ∈ Θ } . (EC.1.1)The mapping d (cid:55)→ m rt,y ( d ) involves the function d (cid:55)→ M (cid:48) t +1 ( y − d ). In order to show that M rt is P -Donsker, we need y (cid:55)→ M (cid:48) t +1 ( y − d ) to be Lipschitz for each ﬁxed d . The P -Donsker property of M rt can then be shown with some more eﬀort based on the stability properties of Donsker classes(van der Vaart and Wellner 1996). These results are summarized below. Lemma EC.2. (See Section EC.3.3 for the proof ) Suppose Assumptions 1 and 2 hold. Then, forall t ∈ T : C2 e-companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies (i)

There exist constants B (cid:48) t and L (cid:48) t that only depend on t such that | M (cid:48) t ( y ) | ≤ B (cid:48) t and | M (cid:48) t ( y ) − M (cid:48) t ( y ) | ≤ L (cid:48) t | y − y | , for all y ∈ Θ . (ii) For any m rt,y , m rt,y ∈ M rt , there exist constants c t, , c t, that only depend on t , such that E ( m rt,y − m rt,y ) ≤ c t, | y − y | + c t, | y − y | , for all y , y ∈ Θ . (iii) All functions in M rt are uniformly bounded by ( b t ∨ h t ) + B (cid:48) t +1 , and M rt is P -Donsker. (iv) sup y ∈ Θ (cid:12)(cid:12) M rt,n ( y ) − M (cid:48) t ( y ) (cid:12)(cid:12) = o P (1) . The following lemma from van der Vaart (1998) on P -Donsker classes is used frequently in thesequel to account for the estimated parameters. Lemma EC.3 (Lemma 19.24 of van der Vaart (1998)) . Suppose that F is a P -Donsker classof measurable functions and f ˆ θ n is a sequence of random functions that take their values in F suchthat (cid:82) ( f ˆ θ n ( x ) − f θ ( x )) dP ( x ) → P for some f ∈ L ( P ) . Then G n f ˆ θ n − G f θ → P . The L ( P ) − convergence condition above is usually shown by using Lipschitz continuity of theclass of functions (with respect to θ for every ﬁxed x ). In the proof of several results in thise-companion, e.g. Lemma EC.2, we need the following lemma to establish Lipschitz continuity.Speciﬁcally, consider a continuous random variable D with support Θ, and two functions g : Θ (cid:55)→ R and h : Θ × Θ (cid:55)→ R . We have the following lemma. Lemma EC.4. (See Section EC.3.2 for the proof ) Suppose there exist positive constants ¯ h and ¯ f such that sup y,d ∈ Θ | h ( y, d ) | ≤ ¯ h and the PDF of D is upper bounded by ¯ f . Further suppose that thereexist positive constants L h and L g such that for any y , y ∈ Θ , E | h ( y , D ) − h ( y , D ) | ≤ L h | y − y | and | g ( y ) − g ( y ) | ≤ L g | y − y | . Then E | h ( y , D ) I { D ≤ g ( y ) } − h ( y , D ) I { D ≤ g ( y ) }| ≤ L h | y − y | + ¯ h ¯ f L g | y − y | . If we additionally assume E | h ( y , D ) − h ( y , D ) | ≤ L h | y − y | , then E | h ( y , D ) I { D ≤ g ( y ) } − h ( y , D ) I { D ≤ g ( y ) }| ≤ L h | y − y | + ¯ h ¯ f L g | y − y | . EC.1.2. Multi-sample U -process For each j ∈ T , let d i j j for i j = 1 , , · · · , n be i.i.d. realizations of D j . As argued in Sections 5.2, 6.1,and 6.2, the terms related to the error propagation of our estimators are multi-sample U -statisticsof the form: U n, ˆ θ n = 1 n T (cid:88) i ,i , ··· ,i T f ˆ θ n ( d i , · · · , d i T T ) , (EC.1.2)where ˆ θ n is an estimated parameter and f ˆ θ n belongs to F = { f θ : θ ∈ Θ } . For ﬁxed θ ∈ Θ, U n,θ normalized by its mean E f θ is asymptotically normal. It is asymptotically equivalent to its H´ajek -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC3 projection U H n,θ = (cid:80) Tj =1 1 n (cid:80) ni j =1 (cid:16) E f θ ( D , · · · , D j − , d i j j , D j +1 , · · · , D T ) − E f θ (cid:17) (see van der Vaart(1998)). In order to control the eﬀect from the changing kernels, we need uniform convergence over θ ∈ Θ: sup θ ∈ Θ (cid:12)(cid:12) √ n ( U n,θ − E f θ ) − √ nU H n,θ (cid:12)(cid:12) → P . (EC.1.3)If (EC.1.3) holds, then we can reduce √ n ( U n, ˆ θ n − E f ˆ θ n ) to √ nU H n, ˆ θ n , which can be analyzed withempirical process theory. Neumeyer (2004) has established (EC.1.3) for T = 2 using an exponentialinequality developed in Nolan and Pollard (1988) to control the conditional tail probabilities of asymmetrized empirical process. However, as pointed out in Nolan and Pollard (1988), this techniquedoes not extend to kernels with more than two arguments. To the best of our knowledge, there areno general results available for T ≥ T ≥ τ with τ ≤ T , We start with K τ − ,τ = { k τ − ,τ,y,x ( · ) : y, x ∈ Θ } , andthen recursively deﬁne K t,τ = { k t,τ,y,x ( · ) : y, x ∈ Θ } for each t ≤ τ −

2, where k t,τ,y,x ( d t , · · · , d τ ) = k t +1 ,τ,y − d t ,x ( d t +1 , · · · , d τ ). The classes of kernels of interest are F t,τ with elements f t,τ,y,x ( d t , · · · , d τ ) = k t,τ,y,x ( d t , · · · , d τ ) × τ − (cid:89) ι = t I ( y − d t − d t +1 − · · · − d ι ≥ s ι +1 ) , (EC.1.4)where k t,τ,y,x ( · ) is in K τ,t . The following lemma is the building block for our theoretical analysis. Lemma EC.5. (See Section EC.3.4) For each ( t, τ ) with ≤ t < τ ≤ T , suppose F t,τ deﬁned aboveis a uniformly bounded Euclidean class. Then, all elements of F t,τ satisfy n τ − t +1 / (cid:88) i t , ··· ,i τ (cid:8) f t,τ,y,x ( d i t t , · · · , d i τ τ ) − E f t,τ,y,x (cid:9) = 1 √ n n (cid:88) i =1 τ (cid:88) ι = t (cid:8) E f t,τ,y,x ( D t , · · · , D ι − , d iι , D ι +1 , · · · , D τ ) − E f t,τ,y,x (cid:9) + R n ( x, y ) , (EC.1.5) where sup x,y ∈ Θ | R n ( x, y ) | = o P (1) . EC.2. Proof of main results

EC.2.1. Consistency of the data-driven policy

Proof of Theorem 1

We prove Theorem 1 by backward induction starting with t = T . Bydeﬁnition, ˆ M T,n ( y ) = M T,n ( y ) which converges to M T ( y ) in probability uniformly in y ∈ Θ byLemma EC.1. Since (cid:98) S T minimizes ˆ M T,n and S T is well separated by Assumption 2, we can useTheorem 5.7 of van der Vaart (1998) to conclude that (cid:98) S T → P S T . Theorem 5.9 of van der Vaart(1998) gives the consistency of ˆ s T , ifˆ M T,n ( y ) − ˆ M T,n ( (cid:98) S T ) − K T → P M T ( y ) − M T ( S T ) − K T uniformly in y ∈ Θ . C4 e-companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

Since ˆ M T,n ( y ) → P M T ( y ) uniformly in y , the above display holds if ˆ M T,n ( (cid:98) S T ) → P M T ( S T ). To showthis, we apply the triangle inequality to bound the diﬀerence | ˆ M T,n ( (cid:98) S T ) − M T ( S T ) | ≤ | ˆ M T,n ( (cid:98) S T ) − M T ( (cid:98) S T ) | + | M T ( (cid:98) S T ) − M T ( S T ) | . The ﬁrst term on the RHS is o P (1) because ˆ M T,n ( y ) → P M T ( y ) uniformly in y . The second is also o P (1) because y (cid:55)→ M T is continuous, and the continuous mapping theorem applies. Thus the LHSof the above display is also o P (1) as desired.Next, suppose that the statement of the theorem holds for periods T, T − , · · · , t + 1, we willshow that it holds for period t . To this end, we ﬁrst show sup y ∈ Θ | ˆ M t,n ( y ) − M t,n ( y ) | → P

0. Deﬁne a n ( y ) (cid:44) ˆ M t +1 ,n ( (cid:98) s t +1 ) n n (cid:88) i =1 I ( y − d it ≤ (cid:98) s t +1 ) + 1 n n (cid:88) i =1 ˆ M t +1 ,n ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) ,b n ( y ) (cid:44) M t +1 ( s t +1 ) n n (cid:88) i =1 I ( y − d it ≤ s t +1 ) + 1 n n (cid:88) i =1 M t +1 ( y − d it ) I ( y − d it ≥ s t +1 ) , and c n ( y ) (cid:44) M t +1 ( s t +1 ) n n (cid:88) i =1 I ( y − d it ≤ (cid:98) s t +1 ) + 1 n n (cid:88) i =1 M t +1 ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) . Then ˆ M t,n ( y ) − M t,n ( y ) = a n ( y ) − b n ( y ) andsup y ∈ Θ | ˆ M t,n ( y ) − M t,n ( y ) | ≤ sup y ∈ Θ | a n ( y ) − c n ( y ) | + sup y ∈ Θ | c n ( y ) − b n ( y ) | . Next, we show that the two terms on the RHS are both o P (1). For the ﬁrst term, we have a n ( y ) − c n ( y ) = ˆ M t +1 ,n ( (cid:98) s t +1 ) − M t +1 ( s t +1 ) n n (cid:88) i =1 I ( y − d it ≤ (cid:98) s t +1 )+ 1 n n (cid:88) i =1 (cid:16) ˆ M t +1 ,n ( t − d it ) − M t +1 ( y − d it ) (cid:17) I ( y − d it ≥ (cid:98) s t +1 ) ≤ sup ˜ y ∈ Θ | ˆ M t +1 ,n (˜ y ) − M t +1 (˜ y ) | × n n (cid:88) i =1 I ( y − d it ≤ (cid:98) s t +1 )+ sup ˜ y ∈ Θ | ˆ M t +1 ,n (˜ y ) − M t +1 (˜ y ) | × n n (cid:88) i =1 I ( y − d it ≥ (cid:98) s t +1 ) , to see that | a n ( y ) − c n ( y ) | ≤ ˜ y ∈ Θ | ˆ M t +1 ,n (˜ y ) − M t +1 (˜ y ) | . The RHS does not depend on y , andso it is o P (1) by the induction hypothesis. Therefore, a n ( y ) − c n ( y ) → P y . For thesecond term, we have c n ( y ) − b n ( y ) = M t +1 ( s t +1 ) × n (cid:80) ni =1 ( I ( y − d it < (cid:98) s t +1 ) − I ( y − d it < s t +1 )) + n (cid:80) ni =1 M t +1 ( y − d it ) ( I ( y − d it ≥ (cid:98) s t +1 ) − I ( y − d it ≥ s t +1 )) . -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC5

Deﬁne M t +1 (cid:44) sup y ∈ Θ M t +1 ( y ), which is ﬁnite because y (cid:55)→ M t +1 ( y ) is continuous on Θ. Note that I ( y − d it ≥ (cid:98) s t +1 ) − I ( y − d it ≥ s t +1 ) = I ( y − d it < (cid:98) s t +1 ) − I ( y − d it < s t +1 ) , and this expression is nonzero only when y − (cid:98) s t +1 ∨ s t +1 < d it ≤ y − (cid:98) s t +1 ∧ s t +1 . We can then bound c n − b n by | c n ( y ) − b n ( y ) | ≤ M t +1 × n n (cid:88) i =1 I ( y − (cid:98) s t +1 ∨ s t +1 ≤ d it ≤ y − (cid:98) s t +1 ∧ s t +1 ) (EC.2.1)= 2 M t +1 × n n (cid:88) i =1 (cid:104) I ( d it ≤ y − (cid:98) s t +1 ∧ s t +1 ) − I ( d it ≤ y − (cid:98) s t +1 ∨ s t +1 ) (cid:105) . We can decompose the second term of the above display into F t ( y − (cid:98) s t +1 ∧ s t +1 ) − F t ( y − (cid:98) s t +1 ∨ s t +1 ) + 1 n n (cid:88) i =1 (cid:104) I ( d it ≤ y − (cid:98) s t +1 ∧ s t +1 ) − F t ( y − (cid:98) s t +1 ∧ s t +1 ) (cid:105) + 1 n n (cid:88) i =1 (cid:104) I ( d it ≤ y − (cid:98) s t +1 ∨ s t +1 ) − F t ( y − (cid:98) s t +1 ∨ s t +1 ) (cid:105) . The second and third terms of the above display are o P (1) because the indicator functions { I ( D t ≤ s ) : s ∈ R } indexed by s are P -Donsker. The ﬁrst term is also o P (1) uniformly in y because F t is uniformly continuous, and (cid:0) y − (cid:98) s t +1 ∧ s t +1 (cid:1) − (cid:0) y − (cid:98) s t +1 ∨ s t +1 (cid:1) → P (cid:98) s t +1 . The above results show that a n ( y ) − b n ( y ) → P y . Then ˆ M t,n ( y ) − M t,n ( y ) → P y and so ˆ M t,n ( y ) → P M T ( y ) uniformly in y by Lemma 1. Since (cid:98) S t is deﬁned asthe minimizer of ˆ M t,n ( y ) , and S t is well-separated by Assumption 2, we can use Theorem 5.7 ofvan der Vaart (1998) to conclude that (cid:98) S t → P S t . Since ˆ M t,n ( y ) → P M t ( y ) and s t is well separatedby Assumption 2, the consistency of (cid:98) s t follows from Theorem 5.9 of van der Vaart (1998). EC.2.2. Asymptotic normality of the data-driven policy in the single period problem

Proof of Lemma 1 Proof of (i) . First consider period T . The right and left derivatives ofˆ M T,n ( y ) in (2.8) are:ˆ M rT,n ( y ) = 1 n n (cid:88) i =1 C rT ( y, d iT ) = 1 n n (cid:88) i =1 (cid:0) ( b T + h T ) I ( d iT ≤ y ) − b T (cid:1) , ˆ M lT,n ( y ) = 1 n n (cid:88) i =1 C lT ( y, d iT ) = 1 n n (cid:88) i =1 (cid:0) ( b T + h T ) I ( d iT < y ) − b T (cid:1) . Every summand on the LHS is bounded by h T ∨ b T , and thus both ˆ M rT,n and ˆ M lT,n ( y ) are bounded by h T ∨ b T . For y / ∈ { d iT } ni =1 , we have ˆ M rT,n ( y ) = ˆ M lT,n ( y ). For y ∈ { d iT } ni =1 , we have ˆ M rT,n ( y ) − ˆ M lT,n ( y ) =( h T + b T ) /n . Thus, the statement is true for c T = h T + b T and B T = h T ∨ b T . C6 e-companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

Now suppose the statement holds for periods

T, T − , · · · , t + 1. The right and left derivativesof ˆ M t,n ( y ) in (2.8) areˆ M rt,n ( y ) = 1 n n (cid:88) i =1 C rt ( y, d it ) + 1 n n (cid:88) i =1 ˆ M rt +1 ,n ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) , ˆ M lt,n ( y ) = 1 n n (cid:88) i =1 C lt ( y, d it ) + 1 n n (cid:88) i =1 ˆ M lt +1 ,n ( y − d it ) I ( y − d it > (cid:98) s t +1 ) . We see that | ˆ M rt,n ( y ) | ≤ b t + h t + B t +1 , and (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 C rt ( y, d it ) − n n (cid:88) i =1 C lt ( y, d it ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ( h t + b t ) /n by the same argument as for period t = T . It then suﬃces to bound: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 ˆ M rt +1 ,n ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) − n n (cid:88) i =1 ˆ M lt +1 ,n ( y − d it ) I ( y − d it > (cid:98) s t +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (EC.2.2)For y / ∈ { (cid:98) s t +1 + d it } ni =1 , we have I ( y − d it ≥ (cid:98) s t +1 ) = I ( y − d it > (cid:98) s t +1 ) for all i , and so: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 ˆ M rt +1 ,n ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) − n n (cid:88) i =1 ˆ M lt +1 ,n ( y − d it ) I ( y − d it > (cid:98) s t +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 (cid:16) ˆ M rt +1 ,n ( y − d it ) − ˆ M lt +1 ,n ( y − d it ) (cid:17) I ( y − d it ≥ (cid:98) s t +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c t +1 /n, by the induction hypothesis. For y = (cid:98) s t +1 + d kt for some k ∈ { , , · · · , n } , we have I ( y − d it ≥ (cid:98) s t +1 ) = I ( y − d it > (cid:98) s t +1 ) for i (cid:54) = k . Then, (EC.2.2) can be bounded by (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 ˆ M rt +1 ,n ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) − n n (cid:88) i =1 ˆ M lt +1 ,n ( y − d it ) I ( y − d it > (cid:98) s t +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) i (cid:54) = k ˆ M rt +1 ,n ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) − n (cid:88) i (cid:54) = k ˆ M lt +1 ,n ( y − d it ) I ( y − d it > (cid:98) s t +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + ˆ M rt +1 ,n ( (cid:98) s t +1 ) n ≤ ( n − c t +1 n + B t +1 n ≤ c t +1 + B t +1 n , where the ﬁrst inequality is by the triangle inequality, and the second is by the induction hypothesisthat | ˆ M lt +1 ,n (˜ y ) − ˆ M rt +1 ,n (˜ y ) | ≤ c t +1 /n . We conclude that the statement holds for period t with B t = b t + h t + B t +1 and c t = b t + h t + c t +1 + B t +1 . Proof of (ii) . Since (cid:98) S t is a minimizer, we must have ˆ M rt,n ( (cid:98) S t ) ≥ M lt,n ( (cid:98) S t ) ≤

0. From Lemma1, we know that ˆ M rt,n ( (cid:98) S t ) ≤ ˆ M lt,n ( (cid:98) S t ) + c t /n ≤ c t /n almost surely. This implies ˆ M rt,n ( (cid:98) S t ) = O P ( n ),whence ˆ M rt,n ( (cid:98) S t ) = o P ( √ n ). -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC7

Proof of Lemma 2

By Lemma EC.2, the class M rt of functions m rt,y ( d ) (see (5.2)) is P -Donsker. Deﬁne the empirical process G n m rt,y = √ n (cid:80) ni =1 (cid:0) m rt,y ( d it ) − E m rt,y ( D t ) (cid:1) indexed by m rt,y ∈ M rt , or equivalently, indexed by y ∈ Θ. Then G n m rt,y converges weakly to a Gaussian processin (cid:96) ∞ (Θ) (the space of bounded functionals on Θ) which is continuous under the semi-metric ρ ( y , y ) = (cid:2) E (cid:0) m t,y ( D ) − m t,y ( D ) (cid:1) (cid:3) / . Since (cid:98) S t → P S t by Theorem 1, by Lemma EC.2 there are c t, , c t, > ρ ( (cid:98) S t , S t ) ≤ c t, | (cid:98) S t − S t | + c t, | (cid:98) S t − S t | → P . By Lemma EC.3, we concludethat G n m rt, (cid:98) S t − G n m rt,S = o P (1), i.e.,1 √ n n (cid:88) i =1 (cid:0) m rt, (cid:98) S t ( d i ) − E m rt, (cid:98) S t ( D ) (cid:1) − √ n n (cid:88) i =1 (cid:0) m rt,S t ( d it ) − E m rt,S t ( D t ) (cid:1) → P , which is equivalent to √ n (cid:16) M rt,n ( (cid:98) S t ) − M rt,n ( S t ) (cid:17) − √ n ( M (cid:48) t ( (cid:98) S t ) − M (cid:48) t ( S t )) → P . We then applyTaylor’s expansion to M (cid:48) t at S t and rewrite the above display as √ n M (cid:48)(cid:48) t ( S t ) (cid:16) (cid:98) S t − S t (cid:17) = − √ n M rt,n ( S ) + √ n M rt,n ( (cid:98) S t ) + o P ( √ n ( (cid:98) S t − S t )) + o P (1) . Proof of Lemma 3

By Lemma EC.1, the class M t = { d (cid:55)→ m t,y ( d ) : y ∈ Θ } is P -Donsker.Since y (cid:55)→ m t,y is Lipschitz, both E [ m t, (cid:98) s − m t,s | (cid:98) s t ] → P E [ m t, (cid:98) S t − m t,S t | (cid:98) S ] → P

0. ApplyingLemma EC.3 to the empirical process G n m t,y gives: √ n ( M t,n ( (cid:98) s t ) − M t,n ( s t )) − √ n ( M t ( (cid:98) s t ) − M t ( s t )) = o P (1) , (EC.2.3) √ n (cid:16) M t,n ( (cid:98) S t ) − M t,n ( S t ) (cid:17) − √ n (cid:16) M t ( (cid:98) S t ) − M t ( S t ) (cid:17) = o P (1) , (EC.2.4)where M t,n ( y ) = n (cid:80) ni =1 m t,y ( d it ) and M t ( y ) = E m t,y ( D t ). Apply Taylor’s expansion to the secondterm on the LHS of (EC.2.3) to see that √ n M (cid:48) t ( s t ) ( (cid:98) s t − s t ) = √ n ( M t,n ( (cid:98) s t ) − M t,n ( s t )) + o P (cid:0) √ n ( (cid:98) s t − s t ) (cid:1) . (EC.2.5)As given in (2.10), (cid:98) s t is the solution of ˆ M t,n ( s ) − ˆ M t,n ( (cid:98) S t ) − K t = 0, where ˆ M t,n ( y ) = n (cid:80) ni =1 ˆ M t,y ( d it )is deﬁned in (2.8). This means that √ n (cid:0) − ˆ M t,n ( (cid:98) s t ) + ˆ M t,n ( (cid:98) S t ) + K t (cid:1) = 0, and so we can add zero tothe RHS of the above display. By further telescoping with M t,n ( (cid:98) S t ), the RHS of the above displaycan be augmented as √ n (cid:16) − M t,n ( s t ) + M t,n ( (cid:98) S t ) + K t (cid:17) − √ n (cid:16) ˆ M t,n ( (cid:98) s t ) − M t,n ( (cid:98) s t ) (cid:17) + √ n (cid:16) ˆ M t,n ( (cid:98) S t ) − M t,n ( (cid:98) S ) (cid:17) + o P (cid:0) √ n ( (cid:98) s t − s t ) (cid:1) . (EC.2.6)Let us look at the ﬁrst term of the above display. We have M (cid:48) t ( S t ) = 0 by deﬁnition of S t , so wecan apply Taylor’s expansion to the second term on the LHS of (EC.2.4) to get √ n (cid:16) M t,n ( (cid:98) S t ) − M t,n ( S t ) (cid:17) = √ n M (cid:48) t ( S t ) (cid:16) (cid:98) S t − S t (cid:17) + o P (cid:16) √ n ( (cid:98) S t − S t ) (cid:17) , which is o P (1) because (cid:98) S t − S t = O P ( n − / ) by assumption. We conclude by replacing M t,n ( (cid:98) S t ) with M t,n ( S t ) in the ﬁrst term of (EC.2.6), and noting that the o P (1) term is absorbed by the last termin the above display. C8 e-companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

Proof of Theorem 2 Proof of (i) : We apply Lemma 2 here. For the terminal period, weknow M (cid:48)(cid:48) T ( S T ) = ( b T + h T ) f T ( S T ) and M T,n ( y ) = ˆ M T,n ( y ) . Because (cid:98) S T is a near-zero solution ofˆ M rT,n ( y ) = 0 by Lemma 1, M rT,n ( (cid:98) S T ) = o P ( n − / ), and so the second term on the RHS of Lemma 2is o P (1). Substitute m T,y = ( b T + h T ) I ( y − d iT ≥ − b T into Lemma 2 to get − √ n n (cid:88) i =1 (cid:104) ( b T + h T ) I ( S T − d iT ≥ − b T (cid:105) = (cid:0) b T + h T (cid:1) √ nf T ( S T )( (cid:98) S T − S T ) + o P (cid:16) √ n ( (cid:98) S T − S T ) (cid:17) . The LHS is O P (1) by the CLT, which implies √ n ( (cid:98) S T − S T ) = O P (1). Thus, the remainder termin the above display can be replaced by o P (1). Divide both sides of the above display by ( b T + h T ) f T ( S T ) to see that (cid:98) S T is asymptotically linear with inﬂuence function d (cid:55)→ f T ( S T ) (cid:18) I ( S T − d ≥ − b T b T + h T (cid:19) . Proof of (ii) : We apply Lemma 3 here. First, we note that M T,n ( y ) = ˆ M T,n ( y ). Then, the thirdand fourth terms on the RHS of the expansion in Lemma 3 are zero. Since M T is convex, weknow M (cid:48) T ( s T ) = (cid:0) b T + h T (cid:1) F T ( s T ) − b T (cid:54) = 0, otherwise s T is a global minimizer of M T , which is acontradiction. Use Lemma 3 to get √ n ( (cid:98) s T − s T ) = √ n M (cid:48) T ( s T ) (cid:0) M T,n ( S T ) − M T ( S T ) (cid:1) − √ n M (cid:48) T ( s T ) (cid:0) M T,n ( s T ) − M T ( s T ) (cid:1) + o P (1 + √ n ( (cid:98) s T − s T )) . Then, √ n ( (cid:98) s T − s T ) = O P (1) by the CLT and the last term of the above display is negligible. Proof of Theorem 3

By Theorem 2, we have (cid:16) √ n ( (cid:98) s T − s T ) , √ n (cid:16) (cid:98) S T − S T (cid:17)(cid:17) = ( G n ϕ sT , G n ϕ ST ) + o P (1) , (EC.2.7)where ϕ sT ( D T ) = 1( b T + h T ) F T ( s T ) − b T ( m T,S T ( D T ) − m T,s T ( D T ) + K T ) ,ϕ ST ( D T ) = (cid:16) I ( S T − D T ≥ − b T b T + h T (cid:17) f T ( S T ) . By the CLT, the RHS of (EC.2.7) converges to a two-dimensional normal distribution with covari-ance matrix Σ . We then use Slutsky’s lemma to conclude normality.Next, we show ˆΣ ( i, j ) → P Σ ( i, j ) for i, j = 1 , n → + ∞ . First look at ˆΣ . The class of func-tions { d T (cid:55)→ I ( d T ≤ y ) : y ∈ Θ } is P -Donsker and so ˆ F T ( y ) = n (cid:80) ni =1 I ( d iT ≤ y ) → P F T ( y ) uniformlyin y . As a result, (cid:0) b T + h T (cid:1) ˆ F T ( (cid:98) s T ) − b T = (cid:0) b T + h T (cid:1) F T ( (cid:98) s T ) − b T + o P (1) . -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC9

Then, the continuous mapping theorem gives: (cid:0) b T + h T (cid:1) ˆ F T ( (cid:98) s T ) − b T → P (cid:0) b T + h T (cid:1) F T ( s T ) − b T . (EC.2.8)On the other hand, the class of functions { d (cid:55)→ ( m T,y ( D T ) − m T,y ( D T ) + K T ) : y , y ∈ Θ } is P -Donsker by Lemma EC.1, and so1 n n (cid:88) i =1 (cid:104) m T, (cid:98) S T ( d iT ) − m T, (cid:98) s T ( d iT ) + K T (cid:105) = E (cid:104) m T, (cid:98) S T ( D T ) − m T, (cid:98) s T ( D T ) + K T (cid:12)(cid:12) (cid:98) S T , (cid:98) s T (cid:105) + o P (1) . Apply the continuous mapping theorem to the RHS of the above display to see1 n n (cid:88) i =1 (cid:104) m T, (cid:98) S T ( d iT ) − m T, (cid:98) s T ( d iT ) + K T (cid:105) → P E [ m T,S T ( D T ) − m T,s T ( D T ) + K T ] = var( m T,S T ( D T ) − m T,s T ( D T )) . (EC.2.9)Combine (EC.2.8) and (EC.2.9) to see ˆΣ (1 , → P Σ (1 , . The consistency of ˆΣ (2 ,

2) is due toour assumption on ˆ f T . We then consider ˆΣ (1 ,

2) = ˆΣ (2 , { d (cid:55)→ (cid:18) I ( d ≤ y ) − b T b T + h T (cid:19) [ m T,y ( D T ) − m T,y ( D T ) + K T ] : y , y ∈ Θ } is P -Donsker, and so1 n n (cid:88) i =1 (cid:18) I ( d iT ≤ (cid:98) S T ) − b T b T + h T (cid:19) (cid:104) m T, (cid:98) S T ( d iT ) − m T, (cid:98) s T ( d iT ) + K T (cid:105) = E (cid:20)(cid:18) I ( D T ≤ (cid:98) S T ) − b T b T + h T (cid:19) (cid:104) m T, (cid:98) S T ( D T ) − m T, (cid:98) s T ( D T ) + K T (cid:105) | (cid:98) S T , (cid:98) s T (cid:21) + o P (1) . By the continuous mapping theorem, the above display is equal to E (cid:20)(cid:18) I ( D T ≤ S T ) − b T b T + h T (cid:19) [ m T,S T ( D T ) − m T,s T ( D T ) + K T ] (cid:21) + o P (1)= cov ( I ( S T − D T ≥ , m T,S T ( D T ) − m T,s T ( D T )) + o P (1) . (EC.2.10)Combine the above display, (EC.2.8), and the condition ˆ f T ( (cid:98) S T ) → P f T ( S T ) to conclude thatˆΣ (1 ,

2) = ˆΣ (2 , → P Σ (1 ,

2) = Σ (2 , EC.2.3. Asymptotic normality of the data-driven policy in the two-period problem

Proof of Lemma 4

For the ﬁrst part, recallˆ M r ,n ( y ) = 1 n n (cid:88) i =1 (cid:16) ( b + h ) I ( y − d i ≥ − b + ˆ M r ,n ( y − d i ) I ( y − d i ≥ (cid:98) s ) (cid:17) , ˜ M r ,n ( y ) = 1 n n (cid:88) i =1 (cid:16) ( b + h ) I ( y − d i ≥ − b + ˆ M r ,n ( y − d i ) I ( y − d i ≥ s ) (cid:17) . C10 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

Take the diﬀerence to see √ n (cid:16) ˆ M r ,n ( y ) − ˜ M r ,n ( y ) (cid:17) = 1 √ n n (cid:88) i =1 ˆ M r ,n ( y − d i ) (cid:8) I ( y − d i ≥ (cid:98) s ) − I ( y − d i ≥ s ) (cid:9) . To decouple the sources of variation in the above display, we deﬁne: I n ( y ) (cid:44) √ n n (cid:88) i =1 M (cid:48) ( y − d i ) (cid:0) I ( y − d i ≥ (cid:98) s ) − I ( y − d i ≥ s ) (cid:1) ,II n ( y ) (cid:44) √ n n (cid:88) i =1 M (cid:48) ( s ) (cid:8) I ( y − d i ≥ (cid:98) s ) − I ( y − d i ≥ s ) (cid:9) . We will show √ n { ˆ M r ,n ( y ) − ˜ M r ,n ( y ) } − I n ( y ) = o P (1) and I n ( y ) − II n ( y ) = o P (1), uniformly in y .It will follow that √ n (cid:16) ˆ M r ,n ( y ) − ˜ M r ,n ( y ) (cid:17) is asymptotically equal to II n ( y ) (the ﬁrst part of thislemma). Consider the diﬀerence between √ n { ˆ M r ,n ( y ) − ˜ M r ,n ( y ) } and I n ( y ): (cid:12)(cid:12)(cid:12) √ n (cid:16) ˆ M r ,n ( y ) − ˜ M r ,n ( y ) (cid:17) − I n ( y ) (cid:12)(cid:12)(cid:12) = 1 √ n (cid:12)(cid:12)(cid:12) n (cid:88) i =1 (cid:8) M r ,n ( y − d i ) − M (cid:48) ( y − d i ) (cid:9) { I ( y − d i ≥ (cid:98) s ) − I ( y − d i ≥ s ) } (cid:12)(cid:12)(cid:12) ≤ √ n n (cid:88) i =1 (cid:12)(cid:12) M r ,n ( y − d i ) − M (cid:48) ( y − d i ) (cid:12)(cid:12) I ( y − (cid:98) s ∨ s ) < d i ≤ y − (cid:98) s ∧ s ) ≤ sup u (cid:12)(cid:12)(cid:12) ˆ M r ,n ( u ) − M (cid:48) ( u ) (cid:12)(cid:12)(cid:12) × √ n n (cid:88) i =1 I ( y − (cid:98) s ∨ s ) < d i ≤ y − (cid:98) s ∧ s ) . The ﬁrst term on the last line is o P (1) by Lemma EC.2. If the second term is O P (1), then theentirety of the last line is o P (1). The second term can be rewritten as1 √ n n (cid:88) i =1 I ( y − (cid:98) s ∨ s < d i ≤ y − (cid:98) s ∧ s )= 1 √ n n (cid:88) i =1 I ( d i ≤ y − (cid:98) s ∧ s ) − √ n n (cid:88) i =1 I ( d i ≤ y − (cid:98) s ∨ s )= 1 √ n n (cid:88) i =1 (cid:0) I ( d i ≤ y − (cid:98) s ∧ s ) − F ( y − (cid:98) s ∧ s ) (cid:1) − √ n n (cid:88) i =1 (cid:0) I ( d i ≤ y − (cid:98) s ∨ s ) − F ( y − (cid:98) s ∨ s ) (cid:1) + √ n { F ( y − (cid:98) s ∧ s ) − F ( y − (cid:98) s ∨ s ) } . The ﬁrst two terms in the last equality are equal to √ n (cid:80) ni =1 ( I ( d i ≤ y − s ) − F ( y − s )) + o P (1)by Lemma EC.3, and thus their diﬀerence is o P (1). On the other hand, √ n ( F ( y − (cid:98) s ∧ s ) − F ( y − (cid:98) s ∨ s )) ≤ √ nf | (cid:98) s − s | is of order O P (1) due to Theorem 3. Then, the above display is O P (1)and (cid:12)(cid:12)(cid:12) √ n (cid:16) ˆ M r ,n ( y ) − ˜ M r ,n ( y ) (cid:17) − I n ( y ) (cid:12)(cid:12)(cid:12) → P

0, uniformly in y . Next, we show I n ( y ) − II n ( y ) is o P (1),starting with: | I n ( y ) − II n ( y ) | -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC11 = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ n n (cid:88) i =1 (cid:0) M (cid:48) ( y − d i ) − M (cid:48) ( s ) (cid:1) (cid:0) I ( y − d i ≥ (cid:98) s ) − I ( y − d i ≥ s ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ n n (cid:88) i =1 (cid:12)(cid:12) M (cid:48) ( y − d i ) − M (cid:48) ( s ) (cid:12)(cid:12) I ( y − (cid:98) s ∨ s ) < d i ≤ y − (cid:98) s ∧ s ) ≤ sup (cid:98) s ∧ s ≤ u< (cid:98) s ∨ s | M (cid:48) ( u ) − M (cid:48) ( s ) | × √ n n (cid:88) i =1 I ( y − (cid:98) s t +1 ∨ s t +1 < d i ≤ y − (cid:98) s ∧ s ) , (EC.2.11)where the last inequality follows because the summand is nonzero only when (cid:98) s t +1 ∨ s t +1 > y − d i ≥ (cid:98) s ∧ s . The analysis for I n above shows that √ n (cid:80) ni =1 I ( y − (cid:98) s ∨ s ≤ d i ≤ y − (cid:98) s ∧ s ) isasymptomatically tight and so it is O P (1), uniformly in y . On the other hand, since M (cid:48) is Lipschitzwith constant L (cid:48) by Lemma EC.2, we can bound the ﬁrst term of the last line bysup (cid:98) s ∧ s ≤ y< (cid:98) s ∨ s | M (cid:48) ( y ) − M (cid:48) ( s ) | ≤ L (cid:48) | (cid:98) s − s | , which is o P (1) because (cid:98) s is a consistent estimator by Theorem 1. Thus, I n ( y ) − II n ( y ) = o P (1),uniformly in y . With these results in hand, the triangle inequality yields (cid:12)(cid:12)(cid:12) √ n (cid:16) ˆ M r ,n ( y ) − ˜ M r ,n ( y ) (cid:17) − II n ( y ) (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) √ n (cid:16) ˆ M r ,n ( y ) − ˜ M r ,n ( y ) (cid:17) − I n ( y ) (cid:12)(cid:12)(cid:12) + | I n ( y ) − II n ( y ) | , and both terms on the right are o P (1), uniformly in y , concluding the ﬁrst part of the lemma.To prove the second part, it suﬃces to consider II n ( (cid:98) S ). By Example 19.6 of van der Vaart(1998), the class G = { d (cid:55)→ I ( d ≤ y ) | y ∈ R } is P -Donsker, and ρ ( y, ˜ y ) (cid:44) E [ I ( D ≤ y ) − I ( D ≤ ˜ y )] = F ( y ∨ ˜ y ) − F ( y ∧ ˜ y ) . By the consistency of (cid:98) S and (cid:98) s , both ρ ( (cid:98) S − (cid:98) s , S − s ) and ρ ( (cid:98) S − s , S − s )converge to zero in probability due to the continuity of F in Assumption 1. Lemma EC.3 gives1 √ n n (cid:88) i =1 (cid:16) I ( (cid:98) S − d i ≥ (cid:98) s ) − I ( (cid:98) S − d i ≥ s ) (cid:17) = 1 √ n n (cid:88) i =1 (cid:16) I ( (cid:98) S − d i ≥ (cid:98) s ) − I ( S − d i ≥ s ) (cid:17) − √ n n (cid:88) i =1 (cid:16) I ( (cid:98) S − d i ≥ s ) − I ( S − d i ≥ s ) (cid:17) = √ n (cid:16) F ( (cid:98) S − (cid:98) s ) − F ( S − s ) (cid:17) − √ n (cid:16) F ( (cid:98) S − s ) − F ( S − s ) (cid:17) + o P (1)= √ nf ( S − s ) (cid:16) (cid:98) S − (cid:98) s − S + s (cid:17) − √ nf ( S − s ) (cid:16) (cid:98) S − S (cid:17) + o P (cid:16) √ n ( (cid:98) S − S ) (cid:17) = − √ nf ( S − s ) ( (cid:98) s − s ) + o P (cid:16) √ n ( (cid:98) S − S ) (cid:17) , where the third equality uses Taylor’s expansion and √ n ( (cid:98) s − s ) = O P (1) by Theorem 3. Then √ n (cid:16) ˆ M r ,n ( (cid:98) S ) − ˜ M r ,n ( (cid:98) S ) (cid:17) = −√ n M (cid:48) ( s ) f ( S − s ) ( (cid:98) s − s ) + o P (cid:16) √ n ( (cid:98) S − S ) (cid:17) , where M (cid:48) ( s ) = ( b + h ) F ( s ) − b , as in (5.3), concluding the lemma. C12 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

Proof of Lemma 5. Proof of (i) . This is a special case of Lemma (11) for the multi-periodsetting which is proved later.

Proof of (ii) . By deﬁnition of U n ( y ) and U H n ( y ) in (6.6) and (6.7), we have √ n (cid:0) U n ( y ) − U H n ( y ) (cid:1) = 1 n √ n n (cid:88) i =1 n (cid:88) j =1 (cid:0) g y ( d i , d j ) − g (2) y ( d j ) (cid:1) . Consider the function class F (3) (cid:44) { ( d , d ) (cid:55)→ g y ( d , d ) − g (2) y ( d ) : y ∈ Θ } . Since F and F (2) areboth Euclidean by the ﬁrst part of this lemma, F (3) is also Euclidean. Furthermore, we have E (cid:2) g y ( D , d ) − g (2) y ( d ) (cid:3) = g (2) y ( d ) − g (2) y ( d ) = 0 and E (cid:2) g y ( d , D ) − g (2) y ( D ) (cid:3) = g (1) y ( d ) = 0 for all d , d , and y . It follows that F (3) is a degenerate Euclidean class with envelope 2( h + b ). We thenapply Theorem 2.5 of Neumeyer (2004) to conclude thatsup y ∈ Θ √ n (cid:0) U n ( y ) − U H n ( y ) (cid:1) = O P (cid:0) n − / (cid:1) = o P (1) . Proof of Lemma 6

Deﬁne x (cid:55)→ (cid:96) θ ( x ) by (cid:96) θ ( x ) (cid:44) E [( I ( θ − D − x ≥ − F ( θ − D )) I ( θ − D ≥ s )],and deﬁne the function class G (cid:44) { x (cid:55)→ (cid:96) θ ( x ) : θ ∈ Θ } . Every (cid:96) θ ∈ G is uniformly bounded by 1.Then, by Lemma EC.4, the mapping θ (cid:55)→ (cid:96) θ ( x ) is Lipschitz for each ﬁxed x . All the conditions ofLemma EC.3 hold and so b + h √ n n (cid:88) j =1 E (cid:104)(cid:16) I ( (cid:98) S − D − d j ≥ − F ( (cid:98) S − D ) (cid:17) I ( (cid:98) S − D ≥ s ) (cid:105) = b + h √ n n (cid:88) j =1 E (cid:2)(cid:0) I ( S − D − d j ≥ − F ( S − D ) (cid:1) I ( S − D ≥ s ) (cid:3) + o P (1) . Proof of Theorem 4

Combining Lemmas 4 and 6 shows that √ n M r ,n ( (cid:98) S ) = −√ n (cid:16) ˆ M r ,n ( (cid:98) S ) − M r ,n ( (cid:98) S ) (cid:17) + o P (1)= −√ n (cid:16) ˆ M r ,n ( (cid:98) S ) − ˜ M r ,n ( (cid:98) S ) + ˜ M r ,n ( (cid:98) S ) − M r ,n ( (cid:98) S ) (cid:17) + o P (1)= √ n M (cid:48) ( s ) f ( S − s )( (cid:98) s − s ) − √ nU H n ( S ) + o P (cid:16) √ n ( (cid:98) S − S ) (cid:17) . By the above display and Lemma 2, we have √ n M (cid:48)(cid:48) ( S ) (cid:16) (cid:98) S − S (cid:17) = − √ n n (cid:88) i =1 (cid:0) ( b + h ) I ( S − d i ≥ − b + M (cid:48) ( S − d i ) I ( S − d i ≥ s ) (cid:1) − √ n (cid:16) ˆ M r ,n ( (cid:98) S ) − M r ,n ( (cid:98) S ) (cid:17) + o P (cid:16) √ n ( (cid:98) S − S ) (cid:17) = − √ n n (cid:88) i =1 m r ,S ( d i ) + √ n M (cid:48) ( s ) f ( S − s )( (cid:98) s − s ) − √ nU H n ( S )+ o P (cid:16) √ n ( (cid:98) S − S ) (cid:17) . (EC.2.12) -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC13

By Theorem 2 (on the asymptotics of (cid:98) s ) and the fact that M (cid:48) ( s ) = ( b + h ) F ( s ) − b ,(( b + h ) F ( s ) − b ) √ n ( (cid:98) s − s ) = 1 √ n n (cid:88) i =1 (cid:2) C ( S , d i ) − C ( s , d i ) + K (cid:3) + o P (1) . We then obtain the equality √ n M (cid:48)(cid:48) ( S ) (cid:16) (cid:98) S − S (cid:17) = O P (1) + o P (cid:16) √ n ( (cid:98) S − S ) (cid:17) by plugging theabove display into (EC.2.12). We can divide both sides by M (cid:48)(cid:48) ( S ) (cid:54) = 0 to conclude √ n ( (cid:98) S − S ) = O P (1), and replace o P (cid:0) √ n ( (cid:98) S − S ) (cid:1) in (EC.2.12) by o P (1) to conclude the theorem. Proof of Lemma 7 Proof of (i) . Recall˜ M ,n ( y ) = 1 n n (cid:88) i =1 (cid:110) C ( y, d i ) + ˆ M ,n ( s ) I ( y − d i < s ) + ˆ M ,n ( y − d i ) I ( y − d i ≥ s ) (cid:111) , (EC.2.13)from (6.9). We can then decompose ˆ M ,n ( y ) − ˜ M ,n ( y ) by writing √ n (cid:16) ˆ M ,n ( y ) − ˜ M ,n ( y ) (cid:17) = I n ( y ) + II n ( y ) + III n ( y ) , where I n ( y ) = ˆ M ,n ( (cid:98) s ) √ n n (cid:88) i =1 I ( y − d i ≤ (cid:98) s ) − ˆ M ,n ( s ) √ n n (cid:88) i =1 I ( y − d i ≤ (cid:98) s ) ,II n ( y ) = ˆ M ,n ( s ) √ n n (cid:88) i =1 I ( y − d i < (cid:98) s ) − ˆ M ,n ( s ) √ n n (cid:88) i =1 I ( y − d i < s ) ,III n ( y ) = 1 √ n n (cid:88) i =1 ˆ M ,n ( y − d i ) I ( y − d i ≥ (cid:98) s ) − √ n n (cid:88) i =1 ˆ M ,n ( y − d i ) I ( y − d i ≥ s ) . We ﬁrst show that II n ( y ) + III n ( y ) = o P (1). Rewrite II n ( y ) as II n ( y ) = ˆ M ,n ( s ) √ n n (cid:88) i =1 I ( y − d i ≥ s ) − ˆ M ,n ( s ) √ n n (cid:88) i =1 I ( y − d i ≥ (cid:98) s ) , to see that | II n ( y ) + III n ( y ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ n n (cid:88) i =1 (cid:16) ˆ M ,n ( y − d i ) − ˆ M ,n ( s ) (cid:17) (cid:0) I ( y − d i ≥ (cid:98) s ) − I ( y − d i ≥ s ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Each summand on the right is only nonzero when (cid:98) s ∧ s ≤ y − d i ≤ (cid:98) s ∨ s . Therefore, the abovedisplay is upper bounded bysup (cid:98) s ∧ s ≤ u ≤ (cid:98) s ∨ s (cid:12)(cid:12)(cid:12) ˆ M ,n ( u ) − ˆ M ,n ( s ) (cid:12)(cid:12)(cid:12) √ n n (cid:88) i =1 I ( y − (cid:98) s ∨ s ≤ d i ≤ y − (cid:98) s ∧ s ) . Similar to the analysis of (EC.2.11), the above display is o P (1), and so sup y ∈ Θ | II n ( y ) + III n ( y ) | = o P (1). We then decompose I n ( y ) into I n ( y ) = √ n { − F ( y − (cid:98) s ) } (cid:16) ˆ M ,n ( (cid:98) s ) − ˆ M ,n ( s ) (cid:17) + (cid:16) ˆ M ,n ( (cid:98) s ) − ˆ M ,n ( s ) (cid:17) (cid:32) √ n n (cid:88) i =1 (cid:0) I ( y − d i ≤ (cid:98) s ) − F ( y − (cid:98) s ) (cid:1)(cid:33) . C14 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

Because { I u − ( x ) = I ( x < u ) : u ∈ R } is P -Donsker, the second term on the second line of theabove display is O P (1), uniformly in y ∈ Θ, and so the second line of the above display is o P (1).Then Lemma EC.3 shows √ n ( ˆ M ,n ( (cid:98) s ) − ˆ M ,n ( s )) = √ n M (cid:48) ( s )( (cid:98) s − s ) + o P (1), and so I n ( y ) = √ n { − F ( y − (cid:98) s ) } M (cid:48) ( s )( (cid:98) s − s ) + o P (1) . Since | (cid:98) s − s | → F is uniformlycontinuous, F ( y − (cid:98) s ) can be replaced by F ( y − s ) in the above display with error o P (1), which isabsorbed by the remainder term. Combine the analysis for I n ( y ) and II n ( y ) + III n ( y ) to conclude. Proof of (ii) . Since ˆ M ,n = M ,n , we have the decomposition1 √ n n (cid:88) i =1 (cid:8) ˆ M ,n ( s ) − M ( s ) (cid:9) I ( y − d i < s ) = √ n ( M ,n ( s ) − M ( s )) { − F ( y − s ) } + ( M ,n ( s ) − M ( s )) 1 √ n n (cid:88) i =1 (cid:110) I ( y − d i < s ) − { − F ( y − s ) } (cid:111) . Because { I u − ( x ) = I ( x < u ) : u ∈ R } is P -Donsker, the term of the second line of the above displayis O P (1), uniformly in y ∈ Θ, which implies that the second line of the above display is o P (1),uniformly in y ∈ Θ. Proof of Lemma 8

The ﬁrst and second parts are a special case of Lemma 15 which is provedlater. For the third part, ˜ F (2) is P -Donsker by the ﬁrst part. Since (cid:98) s → P s and (cid:98) S → P S , byLemma EC.4 we have E (cid:20)(cid:16) ˜ g (2) (cid:98) s ( D ) − ˜ g (2) s ( D ) (cid:17) (cid:12)(cid:12)(cid:12)(cid:98) s (cid:21) → P , E (cid:20)(cid:16) ˜ g (2) (cid:98) S ( D ) − ˜ g (2) S ( D ) (cid:17) (cid:12)(cid:12)(cid:12) (cid:98) S (cid:21) → P . (EC.2.14)Apply Lemma EC.3 to conclude n / ˜ U H n ( (cid:98) s ) = 1 √ n n (cid:88) j =1 ˜ g (2) (cid:98) s ( d j ) → P √ n n (cid:88) j =1 ˜ g (2) s ( d j ) = n / ˜ U H n ( s ) ,n / ˜ U H n ( (cid:98) S ) = 1 √ n n (cid:88) j =1 ˜ g (2) (cid:98) S ( d j ) → P √ n n (cid:88) j =1 ˜ g (2) S ( d j ) = n / ˜ U H n ( S ) . Proof of Theorem 5

Rewrite the ﬁrst equation of Lemma 7 using √ n ( M ,n ( (cid:98) s ) − M ,n ( s )) = √ n M (cid:48) ( s )( (cid:98) s − s ) + o P (1) by (EC.2.5). Then combine (6.10) and Lemmas 7 and 8 to see: √ n (cid:16) ˆ M ,n ( y ) − M ,n ( y ) (cid:17) = √ n ˜ U H n ( y ) − √ n { − F ( y − s ) } { M ,n ( (cid:98) s ) − M ( s ) } + R n ( y ) , (EC.2.15)where sup y ∈ Θ | R n ( y ) | = o P (1). On the other hand, (cid:98) s is the solution of y (cid:55)→ M ,n ( y ) − M ,n ( (cid:98) S ) − K = 0 using the fact that ˆ M ,n = M ,n . Also, s is the solution of y (cid:55)→ M ( y ) − M ( S ) − K = 0, andso √ n ( M ,n ( (cid:98) s ) − M ( s )) = √ n (cid:16) M ,n ( (cid:98) S ) − M ( S ) (cid:17) . Then apply Lemma EC.3 using M (cid:48) ( S ) = 0to see √ n M ,n ( (cid:98) S ) = √ n M ,n ( S ) + o P (1). This reasoning gives the following representation ofˆ M ,n ( y ): √ n (cid:16) ˆ M ,n ( y ) − M ,n ( y ) (cid:17) = √ n ˜ U H n ( y ) − √ n { − F ( y − s ) } { M ,n ( S ) − M ( S ) } + R n ( y ) , -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC15 where sup y ∈ Θ | R n ( y ) | = o P (1). Substitute the above result into Lemma 3 to see: √ n M (cid:48) ( s ) ( (cid:98) s − s ) = − √ n n (cid:88) i =1 (cid:0) m ,S ( d i ) − m ,s ( d i ) + K (cid:1) + 1 √ n n (cid:88) i =1 (cid:16) ˜ g (2) S ( d i ) − ˜ g (2) s ( d i ) (cid:17) , − √ n (cid:16) F ( (cid:98) S − s ) − F ( (cid:98) s − s ) (cid:17) ( M ,n ( S ) − M ( S )) + o P (cid:0) √ n ( (cid:98) s − s ) (cid:1) . Since | (cid:98) S − S | → P | (cid:98) s − s | → P

0, and F is uniformly continuous, we can replace F ( (cid:98) S − s )and F ( (cid:98) s − s ) in the above display by F ( S − s ) and F ( s − s ), with error o P (1) absorbedinto the remainder term. Afterwards, the ﬁrst three terms on the RHS are all O P (1) by the CLT,and so √ n ( (cid:98) s − s ) is also O P (1). Subsequently, the remainder term in the above display is o P (1),concluding the theorem. Proof of Theorem 6

This follows from the multivariate CLT, together with Slutsky’s lemma.The variance estimates are given explicitly in Section EC.3.5.

EC.2.4. Asymptotic Normality: Multi-Period without Setup Cost

Proof of Lemma 9

Write √ n (cid:0) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:1) = √ n (cid:0) ˆ M rt,n ( y ) − ˜ M rt,n ( y ) (cid:1) + √ n (cid:0) ˜ M rt,n ( y ) − M rt,n ( y ) (cid:1) , where ˜ M rt,n ( y ) extends the deﬁnition of ˜ M r ,n ( y ) in (6.1) to:˜ M rt,n ( y ) (cid:44) n n (cid:88) i =1 (cid:16) ( b t + h t ) I ( y − d it ≥ − b t + ˆ M rt +1 ,n ( y − d it ) I ( y − d it ≥ s t +1 ) (cid:17) . (EC.2.16)By Lemma EC.2, the observation that s t +1 = S t +1 and M (cid:48) t +1 ( S t +1 ) = 0, and the assumption that √ n ( (cid:98) S t +1 − S t +1 ) = O P (1), we can repeat the proof for the ﬁrst part of Lemma 4 to show that √ n (cid:16) ˆ M rt,n ( y ) − ˜ M rt,n ( y ) (cid:17) = 1 √ n n (cid:88) i =1 M (cid:48) t +1 ( S t +1 ) (cid:16) I ( y − d it ≥ (cid:98) S t +1 ) − I ( y − d it ≥ S t +1 ) (cid:17) + R n ( y ) , where R n ( y ) is uniformly o P (1). Since M (cid:48) t +1 ( S t +1 ) = 0 , the RHS above is uniformly o P (1), and thus √ n (cid:16) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:17) = √ n (cid:16) ˜ M rt,n ( y ) − M rt,n ( y ) (cid:17) + R n ( y ) where sup y | R n ( y ) | = o P (1). Substitute(EC.2.16) and M rt,n ( y ) = n (cid:80) ni =1 m rt,y ( d it ) with d t (cid:55)→ m rt,y ( d t ) deﬁned in (EC.1.1) to conclude. Proof of Lemma 10

By using Lemma 9 and telescoping with M rt +1 ,n ( y − d it ), we can write √ n (cid:16) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:17) = 1 √ n n (cid:88) i =1 (cid:110) ˆ M rt +1 ,n ( y − d it ) − M rt +1 ,n ( y − d it ) (cid:111) I ( y − d it ≥ S t +1 )+1 √ n n (cid:88) i =1 (cid:8) M rt +1 ,n ( y − d it ) − M (cid:48) t +1 ( y − d it ) (cid:9) I ( y − d it ≥ S t +1 ) + o P (1) , (EC.2.17) C16 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies where the remainder is uniformly small over y . Recall that M rt +1 ,n ( y ) = n (cid:80) ni =1 m rt +1 ,y ( d it +1 ) and M (cid:48) t +1 ( y ) = E m rt +1 ,y ( D t +1 ), where m rt +1 ,y ( d ) = ( b t +1 + h t +1 ) I ( y − d ≥

0) + M (cid:48) t +2 ( y − d ) I ( y − d ≥ S t +2 ) . We can then rewrite the second term of the RHS of (EC.2.17) as1 n √ n (cid:88) i ,i (cid:110) M (cid:48) t +2 I [ S t +2 , ∞ ) ( y − d i t t − d i t +1 t +1 ) − EM (cid:48) t +2 I [ S t +2 , ∞ ) ( y − d i t t − D t +1 ) (cid:111) I ( y − d i t t ≥ S t +1 )+ b t +1 + h t +1 n √ n (cid:88) i t ,i t +1 (cid:110) I ( y − d i t t − d i t +1 t +1 ≥ − F t +1 ( y − d i t t ) (cid:111) I ( y − d i t t ≥ S t +1 ) , (EC.2.18)which is equal to n √ n (cid:80) i t ,i t +1 { ψ t,t +1 ,y ( d i t t , d i t +1 t +1 ) + φ t,t +1 ,y ( d i t t , d i t +1 t +1 ) } . To get the full expansion of(EC.2.17), we use backward induction. First we verify the lemma holds when t = T −

1. Sinceˆ M T,n ≡ M T,n by deﬁnition, the ﬁrst term on the RHS of (EC.2.17) is zero when t = T −

1. Combinethis with (EC.2.18) to see the lemma holds for period T −

1. Next, suppose the lemma holds forperiod ˜ t = t + 1 , t + 2 , · · · , T −

1. Then for period ˜ t = t , the second term on the RHS of (EC.2.17) isgiven by (EC.2.18), and so we only need to consider the ﬁrst term. By the induction hypothesis,every summand of the ﬁrst term, after multiplying by √ n , can be factorized as √ n (cid:16) ˆ M rt +1 ,n ( y ) − M rt +1 ,n ( y ) (cid:17) = T (cid:88) τ = t +2 n τ − t − / (cid:88) i t +1 , ··· ,i τ { ψ t +1 ,τ,y ( d i t +1 t +1 , · · · , d i τ τ ) + φ t +1 ,τ,y ( d i t +1 t +1 , · · · , d i τ τ ) } , up to a remainder term that is bounded above by sup y | R n ( y ) | (since I ( t − d i t t ≥ S t +1 ) ≤ T (cid:88) τ = t +2 n τ − t +1 / (cid:88) i t , ··· ,i τ { ψ t +1 ,τ,y − d itt ( d i t +1 t +1 , · · · , d i τ τ ) + φ t +1 ,τ,y − d i t ( d i t +1 t +1 , · · · , d i τ τ ) } I ( y − d i t t ≥ S t +1 ) + R n ( y )(EC.2.19)where sup y | R n ( y ) | = o P (1) . By deﬁnition, we readily ﬁnd the relations ψ t +1 ,τ,y − d t ( d t +1 , · · · , d τ ) I ( y − d t ≥ S t +1 ) = ψ t,τ,y ( d t , · · · , d τ ) ,φ t +1 ,τ,y − d t ( d t +1 , · · · , d τ ) I ( y − d t ≥ S t +1 ) = φ t,τ,y ( d t , · · · , d τ ) , for all τ ≥ t + 2. Use these relations to simplify (EC.2.19), and combine the simpliﬁed (EC.2.19)with (EC.2.18) and (EC.2.17) to see that the lemma holds for period t . Proof of Lemma 11

Here, we show the classes ˜ H t,τ are Euclidean and the conditions inLemma EC.5 hold. Showing that H t,τ is Euclidean follows similarly, so we omit it here. Note that | M (cid:48) τ +1 | ≤ ˜ B τ +1 and M (cid:48) τ +1 is Lipschitz by Lemma EC.2. We can decompose M (cid:48) τ +1 into M (cid:48) + τ +1 − M (cid:48)− τ +1 ,where M (cid:48) + τ +1 = M (cid:48) τ +1 ∨ M (cid:48)− τ +1 = ( − M (cid:48) τ +1 ) ∧

0. Both M (cid:48) + τ +1 and M (cid:48)− τ +1 are bounded, Lipschitz, -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC17 and of bounded variation. Consider the functions ψ (1) t,τ,y : R τ − t +1 (cid:55)→ R , ψ (2) t,τ,y : R τ − t +1 (cid:55)→ R , and ψ (3) t,y : R τ − t (cid:55)→ R given by ψ (1) t,τ,y ( d t , · · · , d τ ) = { M (cid:48) + τ +1 I [ S τ +1 , + ∞ ) } ( y − d t − · · · − d τ ) × I ( y − d t − · · · − d τ − ≥ S τ ) · · · I ( y − d t ≥ S t +1 ) ,ψ (1) t,τ,y ( d t , · · · , d τ ) = { M (cid:48)− τ +1 I [ S τ +1 , + ∞ ) } ( y − d t − · · · − d τ ) × I ( y − d t − · · · − d τ − ≥ S τ ) · · · I ( y − d t ≥ S t +1 ) ,ψ (3) t,τ,y ( d t , · · · , d τ − ) = E ψ (1) t,τ,y ( d t , · · · , d τ − , D τ ) − E ψ (2) t,τ,y ( d t , · · · , d τ − , D τ ) . Based on ψ (1) t,τ,y , ψ (2) t,τ,y , and ψ (3) t,τ,y , deﬁne the following three function classes˜ H (1) t,τ (cid:44) (cid:110) ψ (1) t,τ,y : y ∈ Θ (cid:111) , ˜ H (2) t,τ (cid:44) (cid:110) ψ (2) t,τ,y : y ∈ Θ (cid:111) ˜ H (3) t,τ (cid:44) (cid:110) ψ (3) t,τ,y : y ∈ Θ (cid:111) . To show ˜ H (1) t,τ is Euclidean, we observe that ψ (1) t,τ,y ( d t , · · · , d τ ) is the minimum of M (cid:48) + τ +1 ( y − d t −· · ·− d τ )and the product of τ − t + 1 indicator functions multiplied by ˜ B τ +1 . The product of these indicatorfunctions, as indexed by y , is a VC class. By Lemma 22 of Nolan and Pollard (1987), the classof functions ( d t , · · · , d τ ) (cid:55)→ M (cid:48) + τ +1 ( y − d t − · · · − d τ ) indexed by y ∈ Θ, as a horizontal translation of M (cid:48) + τ +1 ( − d t − · · · − d τ ), is Euclidean. Therefore, ˜ H (1) t,τ is Euclidean with constant envelope ˜ B τ +1 byLemma 5.3 of Pollard (1990). By the same argument, we can show that ˜ H (2) t,τ is Euclidean withconstant envelope ˜ B τ +1 . As a result, ˜ H (3) t is Euclidean by Corollary 21 of Nolan and Pollard (1987).Finally, observe that ˜ H t,τ is a subset of ˜ H (1) t,τ − ˜ H (2) t,τ − ˜ H (3) t,τ , and thus is Euclidean. By deﬁnitionof ψ t,τ,y in (7.1), ψ t,τ,y ( d i t t , · · · , d i τ τ ) = ψ t +1 ,τ,y − d itt ( d i t +1 t +1 , · · · , d i τ τ ) I ( y − d i t t ≥ S t +1 ) for every τ ≥ t + 2.Therefore, all the conditions of Lemma EC.5 are met and the desired result follows. Proof of Theorem 8

To establish the theorem, we ﬁrst present the following lemma.

Lemma EC.6.

Suppose Assumptions 1 and 2 hold, and ﬁx t ∈ T . Then there exist constants ˇ L τ for t < τ ≤ T such that for all d τ and y , y ∈ Θ , we have: | E ψ t,τ,y ( D t , · · · , D τ − , d τ ) − E ψ t,τ,y ( D t , · · · , D τ − , d τ )) | ≤ ˇ L τ | y − y | , | E φ t,τ,y ( D t , · · · , D τ − , d τ ) − E ψ t,τ,y ( D t , · · · , D τ − , d τ )) | ≤ ˇ L τ | y − y | . Proof.

The proof is based on Lemma EC.4. Recall from (7.1) that ψ t,τ,y ( d t , , · · · , d τ ) = ( b τ + h τ )( I ( y − d t − · · · − d τ ) − F τ ( y − d t − · · · − d τ − )) g y,d t +1 , ··· ,d τ ( d t ) , where g y,d t +1 , ··· ,d τ ( d t ) = (cid:81) τ − ι = t I ( y − d − d τ +1 − · · · − d ι ≥ S ι +1 ). It is not diﬃcult to see that g y,d t +1 , ··· ,d τ ( d t ) = I (cid:0) d t ≤ ˜ g d t +1 , ··· ,d τ ( y ) (cid:1) , where ˜ g d t +1 , ··· ,d τ ( y ) = min { y − S t +1 , y − d t +1 − S t +2 , · · · , y − d t +1 − · · · − d τ − − S τ } . For any functions x (cid:55)→ f ( x ) and x (cid:55)→ g ( x ), we have (cid:12)(cid:12)(cid:12) min x f ( x ) − min x g ( x ) (cid:12)(cid:12)(cid:12) ≤ max x | f ( x ) − g ( x ) | . (EC.2.20) C18 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

See for instance Remark 2.1 in Haskell et al. (2016). By this fact, for all ﬁxed d t +1 , · · · , d τ we have | ˜ g d t +1 , ··· ,d τ ( y ) − ˜ g d t +1 , ··· ,d τ ( y ) | ≤ | y − y | , (EC.2.21)for all y , y . On the other hand, I ( y − d t − · · · − d τ ) − F τ ( y − d t − · · · − d τ − ) is bounded by 2 and E | I ( y − D t − d t +1 − · · · − d τ ) − I ( y − D t − d t +1 − · · · − d τ ) | ≤ f | y − y | , E | F τ ( y − D t − d t +1 − · · · − d τ − ) − F τ ( y − D t − d t +1 − · · · − d τ − ) | ≤ f | y − y | , by Assumption 1. The conditions in Lemma EC.4 hold and so there exists ˇ L τ such that E | ψ t,τ,y ( D t , d t +1 , · · · , d τ ) − ψ t,τ,y ( D t , d t +1 , · · · , d τ )) | ≤ ˇ L τ | y − y | . The above inequality is valid for all d t +1 · · · , d τ , it still holds if we replace d τ +1 , · · · , d τ by D t +1 , · · · , D τ . Now take expectation of the LHS over D t +1 , · · · , D τ , and use Jensen’s inequality toconclude the ﬁrst inequality of the lemma, the second follows by the same reasoning. (cid:3) Returning to the proof of Theorem 8, we combine Lemmas 10 and 11 to see that √ n (cid:16) ˆ M rt,n ( (cid:98) S t ) − M rt,n ( (cid:98) S t ) (cid:17) = 1 √ n n (cid:88) i =1 T (cid:88) τ = t +1 E (cid:110) ψ t,τ, (cid:98) S t ( D t , · · · , D τ − , d iτ )+ ψ t,τ, (cid:98) S t ( D t , · · · , D τ − , d iτ ) (cid:111) + o P (1) . (EC.2.22)Consider the function classes { d τ (cid:55)→ φ ( τ ) t,τ,y ( d τ ) : y ∈ Θ } and { d τ (cid:55)→ ψ ( τ ) t,τ,y ( d τ ) : y ∈ Θ } , where φ ( τ ) t,τ,y ( d τ ) (cid:44) E φ t,τ,y ( D t , · · · , D τ − , d τ ) and ψ ( τ ) t,τ,y ( d τ ) (cid:44) E φ t,τ,y ( D t , · · · , D τ − , d τ ). In view of LemmaEC.6, we use Example 19.7 of van der Vaart (1998) to see that both classes are P -Donsker. LemmaEC.6 also implies ρ ( y , y ) ≤ sup d τ (cid:0) ψ ( τ ) t,τ,y − ψ ( τ ) t,τ,y (cid:1) ≤ ˇ L τ ( y − y ) for ρ ( y , y ) (cid:44) E (cid:0) ψ ( τ ) t,τ,y ( D τ ) − ψ ( τ ) t,τ,y ( D τ ) (cid:1) . Since (cid:98) S t → P S t by Theorem 1, the above display implies ρ ( (cid:98) S t , S t ) → P

0. Repeat thisargument for { d t (cid:55)→ ψ ( τ ) t,τ,y ( d τ ) : y ∈ Θ } , then apply Lemma EC.3 to the RHS of (EC.2.22) to see √ n (cid:16) ˆ M rt,n ( (cid:98) S t ) − M rt,n ( (cid:98) S t ) (cid:17) = 1 √ n n (cid:88) i =1 T (cid:88) τ = t +1 E (cid:110) ψ t,τ,S t ( D t , · · · , D τ − , d iτ )+ φ t,τ,S t ( D t , · · · , D τ − , d iτ ) (cid:111) + o P (1) . Finally, plug the above display back into the RHS of Lemma 2 to conclude.

Proof of Theorem 9

When there is no setup cost, we have s t = S t and so Theorem 4 appliesto periods T and T −

1. By assumption, M (cid:48)(cid:48) T − ( S T − ) (cid:54) = 0 and M (cid:48)(cid:48) T ( S T ) (cid:54) = 0. The asymptotic jointnormality of √ n ( (cid:98) S T − − S T − ) and √ n ( (cid:98) S T − S T ) has been shown in Theorem 4 for inﬂuencefunctions ϕ ST − and ϕ ST , respectively, which implies √ n ( (cid:98) S t − S t ) = O P (1) for t = T − t = T . -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC19

Next, suppose √ n ( (cid:98) S τ − S τ ) is asymptotically normal with inﬂuence functions ϕ Sτ for τ = t + 1 , t +2 , · · · , T . We have √ n ( (cid:98) S τ − S τ ) = O P (1) for τ = t + 1 , t + 2 , · · · , T , and so we can apply Theorem 8to see that √ n M (cid:48)(cid:48) t ( S t ) (cid:16) (cid:98) S t − S t (cid:17) = 1 √ n n (cid:88) i =1 M (cid:48)(cid:48) t ( S t ) ϕ St ( d it , d it +1 , · · · , d iT ) + o P (cid:16) √ n ( (cid:98) S t − S t ) (cid:17) = M (cid:48)(cid:48) t ( S t ) G n ϕ St + o P (cid:16) √ n ( (cid:98) S t − S t ) (cid:17) . Since M (cid:48)(cid:48) t ( S t ) (cid:54) = 0 by assumption, we can apply the CLT to conclude that √ n ( (cid:98) S t − S t ) = O P (1).Plug this to the above to see √ n ( (cid:98) S t − S t ) is asymptotically normal with inﬂuence function ϕ St . Thefunction class { ϕ S , ϕ S , · · · , ϕ ST } is trivially P -Donsker, and thus the theorem holds. EC.2.5. Asymptotic Normality: Multi-Period with Setup Cost

Proof of Lemma 12

We begin with the following preliminary result.

Lemma EC.7.

Suppose Assumptions 1 and 2 hold, and √ n ( (cid:98) s t +1 − s t +1 ) = O P (1) . Then √ n (cid:16) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:17) = 1 √ n n (cid:88) i =1 (cid:16) ˆ M rt +1 ,n ( y − d it ) − M (cid:48) t +1 ( y − d it ) (cid:17) I ( y − d it ≥ s t +1 )+ M (cid:48) t +1 ( s t +1 ) √ n n (cid:88) i =1 (cid:0) I ( y − d it ≥ (cid:98) s t +1 ) − I ( y − d it ≥ s t +1 ) (cid:1) + R n ( y ) , where sup y ∈ Θ | R n ( y ) | = o P (1) . Proof.

Telescope using ˜ M rt,n ( y ) in (EC.2.16) to see √ n (cid:16) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:17) = √ n (cid:110) ˆ M rt,n ( y ) − ˜ M rt,n ( y ) (cid:111) + √ n (cid:110) ˜ M rt,n ( y ) − M rt,n ( y ) (cid:111) . (EC.2.23)Since √ n ( (cid:98) s t +1 − s t +1 ) = O P (1), we can repeat the proof for the ﬁrst part of Lemma 4 to get √ n (cid:16) ˆ M rt,n ( y ) − ˜ M rt,n ( y ) (cid:17) − M (cid:48) t +1 ( s t +1 ) √ n n (cid:88) i =1 (cid:0) I ( y − d it ≥ (cid:98) s t +1 ) − I ( y − d it ≥ s t +1 ) (cid:1) → P , uniformly in y . On the other hand, we have √ n (cid:16) ˜ M rt,n ( y ) − M rt,n ( y ) (cid:17) = 1 √ n n (cid:88) i =1 (cid:16) ˆ M rt +1 ( y − d it ) − M (cid:48) t +1 ( y − d it ) (cid:17) I ( y − d it ≥ s t +1 )by using the expressions of ˜ M rt,n ( y ) in (EC.2.16) and m rt,y ( d t ) in (EC.1.1), and the equality M rt,n ( y ) = n (cid:80) ni =1 m rt,y ( d it ). This concludes the lemma. (cid:3) Returning to the proof of Lemma 12, we see the RHS of the above lemma has an additional term M (cid:48) t +1 ( s t +1 ) √ n n (cid:88) i =1 (cid:0) I ( y − d it ≥ (cid:98) s t +1 ) − I ( y − d it ≥ s t +1 ) (cid:1) , (EC.2.24) C20 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies compared to Lemma 9. This is because the derivative M (cid:48) t +1 ( s t +1 ) is zero without setup costs, butis generally nonzero with setup costs. As with Lemma EC.7, we recursively expand √ n (cid:0) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:1) as a sum of U -processes. Now, an additional term similar to (EC.2.24) surfaces in everyrecursion of the expansion. For the functions χ t,τ,y,x deﬁned in (7.8), we deﬁne the following functionwhich captures the additional error from the nonzero derivatives:∆ t,T ( y ) (cid:44) T − (cid:88) τ = t (cid:88) i t , ··· ,i τ M (cid:48) τ +1 ( s τ +1 ) n τ − t +1 / (cid:8) χ t,τ,y, (cid:98) s τ +1 ( d i t t , · · · , d i τ τ ) − χ t,τ,y,ts τ +1 ( d i t t , · · · , d i τ τ ) (cid:9) . (EC.2.25)Compared to Lemma 10, there is one additional term ∆ t,T ( y ) in the representation of √ n (cid:16) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:17) with setup costs. The proof of this lemma is similar to the proof of Lemma10. First, use Lemma EC.7 and telescope with M rt +1 ,n to see √ n (cid:16) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:17) = I n,t ( y ) + II n,t ( y ) + III n,t ( y ) + R n ( y ) , where sup y ∈ Θ | R n ( y ) | = o P (1), and I n,t ( y ) = 1 √ n n (cid:88) i =1 (cid:110) ˆ M rt +1 ,n ( y − d it ) − M rt +1 ,n ( y − d it ) (cid:111) I ( y − d it ≥ s t +1 ) ,II n,t ( y ) = 1 √ n n (cid:88) i =1 (cid:8) M rt +1 ,n ( y − d it ) − M (cid:48) t +1 ( y − d it ) (cid:9) I ( y − d it ≥ s t +1 ) ,III n,t ( y ) = 1 √ n n (cid:88) i =1 M (cid:48) t +1 ( s t +1 ) (cid:8) I ( y − d it ≥ (cid:98) s t +1 ) − I ( y − d it ≥ s t +1 ) (cid:9) . Using the deﬁnition of M rt +1 ,n ( y ) = n (cid:80) ni t +1 =1 m rt +1 ,y ( d i t +1 t +1 ) with m rt +1 ,y in (EC.1.1) as well as thedeﬁnition of M (cid:48) t +1 in (5.3), we can rewrite II n,t ( y ) above as1 n √ n (cid:88) i t ,i t +1 (cid:16) M (cid:48) t +2 I [ s t +2 , + ∞ ) ( y − d i t t − d i t +1 t +1 ) − EM (cid:48) t +2 I [ s t +2 , + ∞ ) ( y − d i t t − D t +1 ) (cid:17) I ( y − d i t t ≥ s t +1 )+ b t +1 + h t +1 n √ n (cid:88) i t ,i t +1 (cid:110) I ( y − d i t t − d i t +1 t +1 ≥ − F t +1 ( y − d i t t ) (cid:111) I ( y − d i t t ≥ s t +1 ) , (EC.2.26)which is equal to n √ n (cid:80) i t ,i t +1 { ψ t,t +1 ,y ( d i t t , d i t +1 t +1 ) + φ t,t +1 ,y ( d i t t , d i t +1 t +1 ) } . Next, induction is used. Firstconsider period ˜ t = T −

1. Since ˆ M T,n ≡ M T,n by deﬁnition, I n,t ( y ) is zero when t = T −

1. Inaddition,

III n,t ( y ) is equal to the ﬁrst term on the RHS of (7.9) by deﬁnition when t = T − T −

1. Next, suppose the lemmaholds for period ˜ t = t + 1 , t + 2 , · · · , T −

1, and consider period ˜ t = t . For period t ≤ T −

2, thecomputations in (EC.2.26) still hold. Then we use (EC.2.26) to rewrite √ n (cid:16) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:17) as I n,t ( y ) + 1 n √ n (cid:88) i t ,i t +1 (cid:16) ψ t,t +1 ,y ( d i t t , d i t +1 t +1 ) + φ t,t +1 ,y ( d i t t , d i t +1 t +1 ) (cid:17) + III n,t ( y ) + R n ( y ) . (EC.2.27) -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC21

The term I n,t ( y ) can be further expanded by induction (the procedure is similar to the Lemma(10), see e.g. (EC.2.19)). We then arrive at the following result: I n,t ( y ) = T − (cid:88) τ = t +1 n τ − t +1 / (cid:88) i t , ··· ,i τ M (cid:48) τ +1 ( s τ +1 ) (cid:0) χ t,τ,y, (cid:98) s τ +1 ( d i t t , · · · , d i τ τ ) − χ t,τ,y,s τ +1 ( d i t t , · · · , d i τ τ ) (cid:1) + T (cid:88) τ = t +2 n τ − t +1 / (cid:88) i t , ··· ,i τ (cid:0) ψ t,τ,y ( d i t t , · · · , d i τ τ + φ t,τ,y ( d i t t , · · · , d i τ τ ) (cid:1) + R n ( y ) , where sup y ∈ Θ | R n ( y ) | = o P (1). The sum of the ﬁrst term on the RHS above and III n,t ( y ) is equalto ∆ t,T in (EC.2.25). We plug the above display into the ﬁrst term of (EC.2.27) to see (7.9) holds. Proof of Lemma 13

We apply Lemma EC.5 here. Recall from (7.8) that: χ t,τ,y,x ( d t , · · · , d τ ) = I ( y − d t − · · · − d τ ≥ x ) × τ − (cid:89) ι = t I ( y − d t − d t +1 − · · · − d ι ≥ s ι +1 ) . The function class { I ( y − d t − · · · − d τ ≥ x ) : y, x ∈ Θ } is a bounded Euclidean class for all ( t, τ ) pairs.In addition, I ( y − d t − · · · − d τ ≥ x ) = I { ( y − d t ) − d t +1 − · · · − d τ ≥ x ) } . Therefore, the conditions ofLemma EC.5 are satisﬁed and the ﬁrst part holds. For the second part, recall from (EC.2.25) that:∆ t,T ( (cid:98) S t ) = T − (cid:88) τ = t (cid:88) i t , ··· ,i τ M (cid:48) τ +1 ( s τ +1 ) n τ − t +1 / (cid:110) χ t,τ, (cid:98) S t , (cid:98) s τ +1 ( d i t t , · · · , d i τ τ ) − χ t,τ, (cid:98) S t ,s τ +1 ( d i t t , · · · , d i τ τ ) (cid:111) . (EC.2.28)There are T − τ summands on the RHS. Consider the ﬁrst summand which corresponds to t = τ .By deﬁnition of χ t,τ,y,x in (7.8), it is M (cid:48) t +1 ( s t +1 ) √ n n (cid:88) i t =1 (cid:16) I ( (cid:98) S t − d i t t ≥ (cid:98) s t +1 ) − I ( (cid:98) S t − d i t t ≥ s t +1 ) (cid:17) . Under the condition √ n ( (cid:98) s τ − s τ ) = O P (1) for all t + 1 ≤ τ ≤ T , we can repeat the proof of thesecond part of Lemma 4 to see the above display equals −√ n M (cid:48) t +1 ( s t +1 ) f t ( S t − s t +1 )( (cid:98) s t +1 − s t +1 ) + o P (1 + √ n ( (cid:98) S t − S t )) . Using ˜ χ t,τ ( x, y ) = M (cid:48) τ +1 ( s τ +1 ) E χ t,τ,y,x deﬁned in (7.10), the above display alsoequals ˜˙ χ t,t ( S t , s t +1 ) √ n ( (cid:98) s t +1 − s t +1 ) where we recall ˜˙ χ t,t ( x, y ) = ∂∂y ˜ χ t,t ( x, y ). The other T − τ − t + 1 ≤ τ ≤ T −

1. For each t + 1 ≤ τ ≤ T − M (cid:48) τ +1 ( s τ +1 ) n τ − t +1 / (cid:88) i t , ··· ,i τ (cid:110) χ t,τ, (cid:98) S t , (cid:98) s τ +1 ( d i t t , · · · , d i τ τ ) − χ t,τ, (cid:98) S t ,s τ +1 ( d i t t , · · · , d i τ τ ) (cid:111) . (EC.2.29)For all t ≤ j ≤ τ , deﬁne the function class { d j (cid:55)→ ¯ χ ( j ) x,y ( d j ) : x, y ∈ Θ } where¯ χ ( j ) x,y ( d j ) (cid:44) M (cid:48) τ +1 ( s τ +1 ) E χ t,τ,y,x ( D t , D t +1 , · · · , D j − , d j , D j +1 , · · · , D τ ) . (EC.2.30) C22 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

In view of the ﬁrst part of Lemma 13, (EC.2.29) can be expressed as M (cid:48) τ +1 ( s τ +1 ) n τ − t +1 / (cid:88) i t , ··· ,i τ (cid:110) χ t,τ, (cid:98) S t , (cid:98) s τ +1 ( d i t t , · · · , d i τ τ ) − χ t,τ, (cid:98) S t ,s τ +1 ( d i t t , · · · , d i τ τ ) (cid:111) = τ (cid:88) j = t (cid:16) G n ¯ χ ( j ) (cid:98) S t , (cid:98) s τ +1 − G n ¯ χ ( j ) (cid:98) S t ,s τ +1 (cid:17) + √ n (cid:16) ˜ χ t,τ ( (cid:98) S t , (cid:98) s τ +1 ) − ˜ χ t,τ ( (cid:98) S t , s τ +1 ) (cid:17) + o P (1) , (EC.2.31)where ˜ χ t,τ ( x, y ) = M (cid:48) τ +1 ( s τ +1 ) E χ t,τ,y,x is deﬁned in (7.10). By telescoping with ˜ χ t,τ ( S t , s τ +1 ), thelast term of the RHS of (EC.2.31) can be expressed as √ n (cid:16) ˜ χ t,τ ( (cid:98) S t , (cid:98) s τ +1 ) − ˜ χ t,τ ( S t , s τ +1 ) (cid:17) − √ n (cid:16) ˜ χ t,τ ( (cid:98) S t , s τ +1 ) − ˜ χ t,τ ( S t , s τ +1 ) (cid:17) . (EC.2.32)Using Taylor’s expansion for the ﬁrst term in the above display gives: √ n (cid:16) ˜ χ t,τ ( (cid:98) S t , (cid:98) s τ +1 ) − ˜ χ t,τ ( S t , s τ +1 ) (cid:17) = √ n (cid:18) ∂∂x ˜ χ t,τ ( S t , s τ +1 ) , ˜˙ χ t,τ ( S t , s τ +1 ) (cid:19) (cid:18) (cid:98) S t − S t (cid:98) s τ +1 − s τ +1 (cid:19) + o P ( √ n | (cid:98) S t − S t | + √ n | s τ +1 − s τ +1 | ) , where we recall ˜˙ χ t,τ ( x, y ) = ∂∂y ˜ χ t,τ ( x, y ). Similarly, we can perform Taylor’s expansion on the secondterm of (EC.2.32). Take the diﬀerence of these two expansions and use √ n ( (cid:98) s τ +1 − s τ +1 ) = O P (1)to write (EC.2.32) as √ n ˜˙ χ t,τ ( S t , s τ +1 )( (cid:98) s τ +1 − s τ +1 ) + o P (1 + √ n ( (cid:98) S t − S t )) . (EC.2.33)Next, we show that the ﬁrst τ − t + 1 summands on the RHS of (EC.2.31) are of order o P (1). Recall { ¯ χ ( j ) x,y ( d j ) = M (cid:48) τ +1 ( s τ +1 ) E χ t,τ,y,x ( D t , · · · , D j − , d j , D j +1 , · · · , D τ ) : x, y ∈ Θ } , is deﬁned in (EC.2.30) for t ≤ j ≤ τ . Since { χ t,τ,y,x : y, x ∈ Θ } is Euclidean by the ﬁrst part of thislemma, this class is Euclidean and bounded by | M (cid:48) τ +1 ( s τ +1 ) | by Corollary 21 of Nolan and Pollard(1987). Thus, it is also P -Donsker (bounded by | M (cid:48) τ +1 ( s τ +1 ) | ). By (7.8), we have χ t,τ,y,x ( d t , · · · , d τ ) = I ( y − d t − · · · − d τ ≥ x ) g y,d t +1 , ··· ,d τ ( d t ) , (EC.2.34)where g y,d t +1 , ··· ,d τ ( d t ) = (cid:81) τ − ι = t I ( y − d t − · · · − d ι ≥ s ι +1 ). Similar to (EC.2.21), we know g y,d t +1 , ··· ,d τ ( d t ) = I (cid:0) d t ≤ ˜ g d t +1 , ··· ,d τ ( y ) (cid:1) where ˜ g d t +1 , ··· ,d τ ( y ) = min { y − s t +1 , y − d t +1 − s t +2 , · · · , y − d t +1 − · · · − d τ − − s τ } and | ˜ g d t +1 , ··· ,d τ ( y ) − ˜ g d t +1 , ··· ,d τ ( y ) | ≤ | y − y | . On the other hand, Assump-tion 1 implies E | I ( y − D t − d t +1 − · · · − d τ ≥ x ) − I ( y − D t − d t +1 − · · · − d t ≥ x ) | ≤ f ( | y − y | + | x − x | ) where f is an upper bound of the PDF of D . In view of (EC.2.34) and arguing as inLemma EC.4, there exists a constant L χ such that E | χ t,τ,y ,x ( D t , d t +1 , · · · , d τ ) − χ t,τ,y ,x ( D t , d t +1 , · · · , d τ ) | ≤ L χ ( | y − y | + | x − x | ) . -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC23

The above inequality is valid for all d t +1 , · · · , d τ , and still holds if we replace d t +1 , · · · , d τ by D t +1 , · · · , D τ . Recall ¯ χ ( j ) x,y ( d j ) = M (cid:48) τ +1 ( s τ +1 ) E χ t,τ,y,x ( D t , · · · , D j − , d j , D j +1 , · · · , D τ ) and useJensen’s inequality to get (cid:8) ¯ χ ( j ) x ,y ( d j ) − ¯ χ ( j ) x ,y ( d j ) (cid:9) / | M τ +1 ( s τ +1 ) | = | E ( χ t,τ,y ,x ( D t , · · · , D j − , d j , D j +1 , · · · , D τ ) − χ t,τ,y ,x ( D t , · · · , D j − , d j , D j +1 , · · · , D τ )) | ≤ E | χ t,τ,y ,x ( D t , · · · , D j − , d j , D j +1 , · · · , D τ ) − χ t,τ,y ,x ( D t , · · · , D j − , d j , D j +1 , · · · , D τ ) | , for all t ≤ j ≤ τ. Take expectation over D j on the LHS to see E (cid:0) ¯ χ ( j ) x ,y ( D j ) − ¯ χ ( j ) x ,y ( D j ) (cid:1) ≤ | M τ +1 ( s τ +1 ) | E | χ t,τ,y ,x ( D t , · · · , D τ ) − χ t,τ,y ,x ( D t , · · · , D τ ) | ≤ | M τ +1 ( s τ +1 ) | L χ ( | y − y | + | x − x | ) . Since (cid:98) S t → P S t and (cid:98) s τ +1 → P s τ +1 , the above display implies that both ρ j ( (cid:98) S t , (cid:98) s τ +1 , S t , s τ +1 )and ρ j ( (cid:98) S t , s τ +1 , S t , s τ +1 ) converge to zero in probability where ρ j ( x , y , x , y ) (cid:44) E (cid:0) ¯ χ ( j ) x ,y ( D j ) − ¯ χ ( j ) x ,y ( D j ) (cid:1) for all t ≤ j ≤ τ . Telescope with G n ¯ χ ( j ) S t ,s τ +1 and use Lemma EC.3 tosee G n ¯ χ ( j ) (cid:98) S t , (cid:98) s τ +1 − G n ¯ χ ( j ) (cid:98) S t ,s τ +1 = G n ¯ χ ( j ) (cid:98) S t , (cid:98) s τ +1 − G n ¯ χ ( j ) S t ,s τ +1 + G n ¯ χ ( j ) S t ,s τ +1 − G n ¯ χ ( j ) (cid:98) S t ,s τ +1 → P , for all 1 ≤ j ≤ t . Combining (EC.2.31) and (EC.2.33), for arbitrary t + 1 ≤ τ ≤ T − M (cid:48) τ +1 ( s τ +1 ) n τ − t +1 / (cid:88) i t , ··· ,i τ (cid:110) χ t,τ, (cid:98) S t , (cid:98) s τ +1 ( d i t t , · · · , d i τ τ ) − χ t,τ, (cid:98) S t ,s τ +1 ( d i t t , · · · , d i τ τ ) (cid:111) = √ n ˜˙ χ t,τ ( S t , s τ +1 )( (cid:98) s τ +1 − s τ +1 ) + o P (cid:16) √ n (cid:16) (cid:98) S t − S t ) (cid:17)(cid:17) . We conclude by plugging the above display into the summands on the RHS of (EC.2.28).

Proof of Lemma 14

We deﬁne ˜ M t,n which extends the deﬁnition of ˜ M ,n in (6.9):˜ M t,n ( y ) = 1 n n (cid:88) i =1 (cid:110) C t ( y, d it ) + ˆ M t +1 ,n ( s t +1 ) I ( y − d it < s t +1 ) + ˆ M t +1 ,n ( y − d it ) I ( y − d it ≥ s t +1 ) (cid:111) , and we decompose √ n (cid:16) ˆ M t,n ( y ) − M t,n ( y ) (cid:17) as √ n (cid:16) ˆ M t,n ( y ) − M t,n ( y ) (cid:17) = √ n (cid:16) ˆ M t,n ( y ) − ˜ M t,n ( y ) (cid:17) + √ n (cid:16) ˜ M t,n ( y ) − M t,n ( y ) (cid:17) . Under the assumption that √ n ( (cid:98) s t +1 − s t +1 ) = O P (1), the proof of Lemma 7 extends here and it isonly necessary to replace (cid:98) s with (cid:98) s t +1 . Use √ n ( M t +1 ,n ( (cid:98) s t +1 ) − M t +1 ,n ( s t +1 )) = √ n M (cid:48) t +1 ( s t +1 )( (cid:98) s t +1 − s t +1 ) + o P (1) in (EC.2.5) to rewrite the ﬁrst equation of Lemma 7. Then, combine the two equationsof Lemma 7 and use the explicit expression of ˜ M t,n ( y ) above and M t,n in (4.1) to see √ n (cid:16) ˆ M t,n ( y ) − M t,n ( y ) (cid:17) = { − F t ( y − s t +1 ) } √ n (cid:16) ˆ M t +1 ( (cid:98) s t +1 ) − M t +1 ( s t +1 ) (cid:17) + 1 √ n n (cid:88) i =1 (cid:16) ˆ M t +1 ,n ( y − d it ) − M t +1 ( y − d it ) (cid:17) I ( y − d it ≥ s t +1 ) + R n ( y ) . where sup y ∈ Θ | R n ( y ) | = o P (1) . C24 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

Proof of Lemma 15

Deﬁne the following functions:Φ t,τ,y ( d t , · · · , d τ ) (cid:44) M τ +1 ( s τ +1 ) I ( y − d t − d τ +1 − · · · − d τ < s τ +1 ) × I ( y − d t − · · · − d τ − ≥ s τ ) × · · ·× I ( y − d t − d t +1 ≥ s t +2 ) × I ( y − d t ≥ s t +1 ) , (EC.2.35)Λ t,τ,y ( d t , · · · , d τ ) (cid:44) M τ +1 ( y − d t − d t +1 − · · · − d τ ) × I ( y − d t − · · · − d τ − ≥ s τ ) × · · ·× I ( y − d t − d t +1 ≥ s t +2 ) × I ( y − d t ≥ s t +1 ) , (EC.2.36)Π t,τ,y ( d t , · · · , d τ ) (cid:44) (cid:0) b τ ( d τ + · · · + d t − y ) + + h τ ( y − d t − · · · − d τ ) + (cid:1) × I ( y − d t − · · · − d τ − ≥ s τ ) × · · ·× I ( y − d t − d t +1 ≥ s t +2 ) × I ( y − d t ≥ s t +1 ) , (EC.2.37) δ t,T ( y ) (cid:44) { − F t ( y − s t +1 ) }√ n (cid:16) ˆ M t +1 ,n ( (cid:98) s t +1 ) − M t +1 ( s t +1 ) (cid:17) + E Φ t,t +1 ,y ( D t , D t +1 ) M t +2 ( s t +2 ) √ n (cid:16) ˆ M t +2 ,n ( (cid:98) s t +2 ) − M t +2 ( s t +2 ) (cid:17) + · · · + E Φ t,T − ,y ( D t, , · · · , D T − ) M T ( s T ) √ n (cid:16) ˆ M T,n ( (cid:98) s T ) − M T ( s T ) (cid:17) . (EC.2.38)By deﬁnition, we have Ψ t,τ,y = Φ t,τ,y + Π t,τ,y + Λ t,τ,y , where Ψ t,τ,y is deﬁned in (7.12). In view ofthe deﬁnition of Φ t,τ above, δ t,T ( y ) also satisﬁes: δ t,T ( y ) = T − (cid:88) τ = t (cid:110) √ n (cid:16) ˆ M τ +1 ,n ( (cid:98) s τ +1 ) − M t +1 ( s t +1 ) (cid:17) EI ( y − D t − · · · − D τ < s t +1 ) × τ − (cid:89) ι = t I ( y − D t − · · · − D ι ≥ s ι +1 ) (cid:111) . (EC.2.39)The product in the last line is deﬁned to be one if t = τ . Returning to the proof the lemma, theproof is decomposed into three steps. Step one.

In this step, we establish the following equality for all t ∈ T : T (cid:88) τ = t +1 n τ − t +1 / (cid:88) i t , ··· ,i τ (cid:16) Ψ t,τ,y ( d i t t , · · · , d i τ τ ) − E Ψ t,τ,y ( d i t t , · · · , d i τ − τ − , D τ ) (cid:17) = 1 √ n n (cid:88) i =1 T (cid:88) τ = t +1 (cid:0) E Ψ t,τ,y ( D t , · · · , D τ − , d iτ ) − E Ψ t,τ,y ( D t , · · · , D τ ) (cid:1) + R n ( y ) . It suﬃces to show that the three function classes ˇ F (1) t,τ (cid:44) { Π t,τ,y ( d i t t , d i t +1 t +1 , · · · , d ττ ) : y ∈ Θ } , ˇ F (2) t,τ (cid:44) { Φ t,τ,y ( d i t t , d i t +1 t +1 , · · · , d ττ ) : y ∈ Θ } , and ˇ F (3) t,τ (cid:44) { Λ t,τ,y ( d i t t , d i t +1 t +1 , · · · , d ττ ) : y ∈ Θ } are Euclidean. Takeˇ F (3) t,τ for example, and recallΛ t,τ,y ( d t , · · · , d τ ) = M τ +1 ( y − d t − · · · − d τ ) × I ( y − d t − · · · − d τ ≥ s τ +1 ) × · · · × I ( y − d t ≥ s t +1 )from (EC.2.36). Both the parameter y and the demand d lie within a compact set by Assumption1, and thus there exists a constant M τ +1 such that 0 ≤ M τ ( · ) ≤ M τ +1 . To show ˇ F (3) t,τ is Euclidean, -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC25 by deﬁnition Λ t,τ,y ( d t , · · · , d τ ) is the minimum of M τ +1 ( y − d t − · · · − d τ ) and the product of τ − t + 1indicator functions multiplied by M τ +1 . The product of these indicator functions, as indexed by y , is a VC class. By Lemma 22 of Nolan and Pollard (1987), the class of functions ( d t , · · · , d τ ) (cid:55)→ M τ +1 ( y − d t − · · · − d τ ) indexed by y ∈ Θ, as a horizontal translation of M τ +1 ( − d t − · · · − d τ ), isEuclidean. Therefore, ˇ F (3) t,τ is Euclidean with envelope M τ +1 by Lemma 5.3 of Pollard (1990). Theanalysis for ˇ F (2) t,τ and ˇ F (1) t,τ is similar and is omitted, and the rest of the argument follows Lemma 11. Step two.

Next, we establish the following equality for all t ∈ T . √ n (cid:16) ˆ M t,n ( y ) − M t,n ( y ) (cid:17) = T (cid:88) τ = t +1 n τ − t +1 / (cid:88) i t , ··· ,i τ (cid:16) Ψ t,τ,y ( d i t t , · · · , d i τ τ ) − E Ψ t,τ,y ( d i t t , · · · , d i τ − τ − , D τ ) (cid:17) + δ t,T + R n ( y ) . (EC.2.40)The ﬁrst term on the RHS has been analyzed in the ﬁrst step and the second term is δ t,T in(EC.2.39). Use Lemma 14 and telescope with M t +1 ,n to write √ n (cid:16) ˆ M t,n ( y ) − M t,n ( y ) (cid:17) = I n,t ( y ) + II n,t ( y ) + III n,t ( y ) + R n ( y ) , (EC.2.41)where sup y ∈ Θ | R n ( y ) | = o P (1), and I n,t ( y ) = 1 √ n n (cid:88) i =1 (cid:16) ˆ M t +1 ,n ( y − d it ) − M t +1 ,n ( y − d it ) (cid:17) I ( y − d it ≥ s t +1 ) ,II n,t ( y ) = 1 √ n n (cid:88) i =1 (cid:0) M t +1 ,n ( y − d it ) − M t +1 ( y − d it ) (cid:1) I ( y − d it ≥ s t +1 ) ,III n,t ( y ) = { − F t ( y − s t +1 ) }√ n (cid:16) ˆ M t +1 ( (cid:98) s t +1 ) − M t +1 ( s t +1 ) (cid:17) . Using the expression of Ψ t,t +1 in (7.12) and the expressions of M t +1 ,n and M t +1 given in (4.1) and(2.7), II n,t ( y ) satisﬁes II n,t ( y ) = 1 n √ n (cid:88) i t ,i t +1 (cid:16) Ψ t,t +1 ,y ( d i t t , d i t +1 t +1 ) − E Ψ t,t +1 ,y ( d i t t , D t +1 ) (cid:17) . (EC.2.42)To get the full expansion of √ n (cid:16) ˆ M t,n ( y ) − M t,n ( y ) (cid:17) , we use backward induction and ﬁrst establish(EC.2.40) for period ˜ t = T −

1. Since ˆ M T,n ≡ M T,n by deﬁnition, I n,t ( y ) is zero when t = T − III n,t ( y ) is equal to δ t,T when t = T −

1. Combine this with (EC.2.42) to see that(EC.2.40) holds for period T − t = t + 1 , · · · , T − t = t . When t ≤ T − II n,t ( y ) can be replaced by the RHS of (EC.2.42). On the other hand, I n,t ( y ) can beexpanded by induction (similar to Lemma 10, see e.g. (EC.2.19)). We arrive at: I n,t ( y ) = T (cid:88) τ = t +2 n τ − t +1 / (cid:88) i t , ··· ,i τ (cid:16) Ψ t,τ,y ( d i t t , · · · , d i τ τ ) − E Ψ t,τ,y ( d i t t , · · · , d i τ − τ − , D τ (cid:17) +1 n n (cid:88) i t =1 δ t +1 ,T ( y − d i t t ) I ( y − d i t t ≥ s t +1 ) + R n ( y ) , (EC.2.43) C26 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies where sup y | R n ( y ) | = o P (1) . Combining (EC.2.42) and (EC.2.43), we can rewrite the RHS of(EC.2.41) as T (cid:88) τ = t +1 n τ − t +1 / (cid:88) i t , ··· ,i τ (cid:16) Ψ t,τ,y ( d i t t , · · · , d i τ τ ) − E Ψ t,τ,y ( d i t t , · · · , d i τ − τ − , D τ (cid:17) +1 n n (cid:88) i t =1 δ t +1 ,T ( y − d i t t ) I ( y − d i t t ≥ s t +1 ) + III n,t ( y ) + R n ( y ) . (EC.2.44)To conclude this part, we show the sum of the second and third terms above is δ t,T ( y ) in (EC.2.39).Recall from (EC.2.38) that δ t +1 ,T ( y ) is equal to { − F t +1 ( y − s t +2 ) }√ n (cid:16) ˆ M t +2 ,n ( (cid:98) s t +2 ) − M t +2 ( s t +2 ) (cid:17) + T − (cid:88) τ = t +2 E Φ t +1 ,τ,y ( D t +1 , · · · , D τ ) M τ ( s τ ) √ n (cid:16) ˆ M τ +1 ,n ( (cid:98) s τ +1 ) − M τ +1 ( s τ +1 ) (cid:17) , and thus n (cid:80) ni t =1 δ t +1 ,T ( y − d i t ) I ( y − d i t t ≥ s t +1 ) is equal to (cid:34) n n (cid:88) i t =1 { − F t +1 ( y − d i t t − s t +2 ) } I ( y − d i t t ≥ s t +1 ) (cid:35) √ n (cid:16) ˆ M t +2 ,n ( (cid:98) s t +2 ) − M t +2 ( s t +2 ) (cid:17) + T − (cid:88) τ = t +2 (cid:34) n n (cid:88) i t =1 E Φ t +1 ,τ,y − d itt ( D t +1 , · · · , D τ ) I ( y − d i t t ≥ s t +1 ) (cid:35) √ n (cid:16) ˆ M τ +1 ,n ( (cid:98) s τ +1 ) − M τ +1 ( s τ +1 ) (cid:17) M τ +1 ( s τ +1 ) . (EC.2.45)By step one of the proof, the function class { d t (cid:55)→ E Φ t,τ,y ( d t , D t +1 , · · · , D τ ) , y ∈ Θ } is Euclidean byCorollary 21 of Nolan and Pollard (1987), and so it is also P -Donsker. It follows that1 n n (cid:88) i t =1 E Φ t +1 ,τ,y − d itt ( D t +1 , · · · , D τ ) I ( y − d i t t ≥ s t +1 ) = 1 n n (cid:88) i t =1 E Φ t,τ,y ( d i t t , D t +1 , · · · , D τ ) → P E Φ t,τ,y ( D t , · · · , D τ ) , (EC.2.46)uniformly in y . The ﬁrst equality is by deﬁnition of Φ t,τ,y in (EC.2.35). On the other hand, bydeﬁnition of Φ t,t +1 ,y in (EC.2.35), we have1 n n (cid:88) i t =1 { − F t +1 ( y − d i t t − s t +2 ) } I ( y − d i t t ≥ s t +1 ) = 1 n n (cid:88) i t =1 E Φ t,t +1 ,y ( d i t t , D t +1 ) M t +2 ( s t +2 ) → P E Φ t,t +1 ,y ( D t , D t +1 ) M t +2 ( s t +2 )(EC.2.47)uniformly in y . Combine the above display and (EC.2.46), and then use √ n (cid:16) ˆ M τ,n ( (cid:98) s τ ) − M τ ( s τ ) (cid:17) = O P (1) for all τ ≥ t + 2 to express the RHS of (EC.2.45) as T − (cid:88) τ = t +1 E Φ t,τ,y ( D t , · · · , D τ ) M τ +1 ( s τ +1 ) √ n (cid:16) ˆ M τ +1 ,n ( (cid:98) s τ +1 ) − M τ +1 ( s τ +1 ) (cid:17) + R n ( y ) , -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC27 where sup y | R n ( y ) | = o P (1) . The sum of the above display and

III n,t ( y ) is equal to δ t,T ( y ) in(EC.2.39) by deﬁnition. In view of (EC.2.44), this establishes (EC.2.40). Step three.

In this step, we establish the following equality for all t ∈ T : √ n (cid:16) ˆ M t,n ( (cid:98) s t ) − M t ( s t ) (cid:17) = √ n (cid:16) ˆ M t,n ( (cid:98) s t ) − M t,n ( (cid:98) s t ) (cid:17) + G n m t,s t + √ n M (cid:48) t ( s t )( (cid:98) s t − s t ) + o P (cid:0) √ n ( (cid:98) s t − s t ) (cid:1) . We can add and subtract M t,n ( (cid:98) s t ) and M t ( (cid:98) s t ) to see √ n (cid:16) ˆ M t,n ( (cid:98) s t ) − M t ( s t ) (cid:17) = √ n (cid:16) ˆ M t,n ( (cid:98) s t ) − M t,n ( (cid:98) s t ) (cid:17) + √ n ( M t,n ( (cid:98) s t ) − M t ( (cid:98) s t )) + √ n ( M t ( (cid:98) s t ) − M t ( s t ))= √ n (cid:16) ˆ M t,n ( (cid:98) s t ) − M t,n ( (cid:98) s t ) (cid:17) + G n m t, (cid:98) s t + √ n M (cid:48) t ( s t )( (cid:98) s t − s t ) + o P ( √ n ( (cid:98) s t − s t ) . The second term in the second line is due to the deﬁnitions of M t,n and M t in (2.7) and (4.1), andthe third term is a Taylor expansion. Use (EC.2.3) to replace the second term of the above displaywith G n m t,s t , and then combine the above display with the previous two steps to conclude. Proof of Theorem 10

The proof is by backwards induction. For period T and T −

1, we useTheorems 5 and 6 as well as (EC.2.15). To begin, suppose the theorem holds for periods t + 1 , · · · , T and consider period t . Then, the conditions in Lemma 15 are met and so √ n (cid:16) ˆ M t,n ( (cid:98) s t ) − M t,n ( (cid:98) s t ) (cid:17) = G (cid:96) (cid:98) s t + δ t,T ( (cid:98) s t ) + o P (1) , (EC.2.48) √ n (cid:16) ˆ M t,n ( (cid:98) S t ) − M t,n ( (cid:98) S t ) (cid:17) = G (cid:96) (cid:98) S t + δ t,T ( (cid:98) S t ) + o P (1) , (EC.2.49)where δ t,T is deﬁned in (EC.2.39) and (cid:96) y ( d t , · · · , d T ) = T (cid:88) τ = t +1 E Ψ t,τ,y ( D t , · · · , D τ − , d τ ) . (EC.2.50)We deﬁne constants c τ,s t (cid:44) E Φ t,τ,s t ( D t , · · · , D τ ) M τ +1 ( s τ +1 ) , c τ,S t (cid:44) E Φ t,τ,S t ( D t , · · · , D τ ) M τ +1 ( s τ +1 ) , for t ≤ τ ≤ T −

1. By deﬁnition of δ t,T in (EC.2.39), (7.13), and the induction hypothesis we have δ t,T ( (cid:98) s t ) = T − (cid:88) τ = t c τ,s t (cid:0) G n κ τ +1 ,s τ +1 + G n m τ +1 ,s τ +1 + M (cid:48) τ ( s τ +1 ) G n ϕ sτ +1 (cid:1) + o P (1) . Similarly, δ t,T ( (cid:98) S t ) = T − (cid:88) τ = t c τ,S t (cid:0) G n κ τ +1 ,s τ +1 + G n m τ +1 ,s τ +1 + M (cid:48) τ ( s τ +1 ) G n ϕ sτ +1 (cid:1) + o P (1) . C28 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

As a result, (EC.2.48) and (EC.2.49) become √ n (cid:16) ˆ M t,n ( (cid:98) s t ) − M t,n ( (cid:98) s t ) (cid:17) = G (cid:96) (cid:98) s t + T − (cid:88) τ = t c τ,s t (cid:0) G n κ τ +1 ,s τ +1 + G n m τ +1 ,s τ +1 + M (cid:48) τ ( s τ +1 ) G n ϕ sτ +1 (cid:1) + o P (1) , (EC.2.51) √ n (cid:16) ˆ M t,n ( (cid:98) S t ) − M t,n ( (cid:98) S t ) (cid:17) = G (cid:96) (cid:98) S t + T − (cid:88) τ = t c τ,S t (cid:0) G n κ τ +1 ,s τ +1 + G n m τ +1 ,s τ +1 + M (cid:48) τ ( s τ +1 ) G n ϕ sτ +1 (cid:1) + o P (1) . (EC.2.52)Next, we show that y (cid:55)→ (cid:96) y is Lipschitz in y . By deﬁnition of Ψ t,τ,y in (7.12), the form of Ψ t,τ,y is similar to ψ t,τ,y in Lemma EC.6. As in Lemma EC.6, we can show Lipschitz continuity of y (cid:55)→ Ψ t,τ,y ( D t , · · · , D τ − , d τ ). Then, Lipschitz continuity of y (cid:55)→ (cid:96) y follows from (EC.2.50). By Lipschitzcontinuity, we can use Lemma EC.3 to replace G (cid:96) (cid:98) s t and G (cid:96) (cid:98) S t above with G (cid:96) s t and G (cid:96) S t , respectively.Deﬁne κ t,s t (cid:44) (cid:96) s t + T − (cid:88) τ = t c τ,s τ (cid:0) κ τ +1 ,s τ +1 + m τ +1 ,s τ +1 + M (cid:48) τ +1 ( s τ +1 ) ϕ sτ +1 (cid:1) , (EC.2.53) κ t,S t (cid:44) (cid:96) S t + T − (cid:88) τ = t c τ,S τ (cid:0) κ τ +1 ,s τ +1 + m τ +1 ,s τ +1 + M (cid:48) τ +1 ( s τ +1 ) ϕ sτ +1 (cid:1) , (EC.2.54)then (EC.2.51) becomes √ n (cid:16) ˆ M t,n ( (cid:98) s t ) − M t,n ( (cid:98) s t ) (cid:17) = G n κ t,s t + o P (1) and (EC.2.52) becomes √ n (cid:16) ˆ M t,n ( (cid:98) S t ) − M t,n ( (cid:98) S t ) (cid:17) = G n κ t,S t + o P (1). We now consider √ n ( (cid:98) S t − S t ). Deﬁne functions( d t , · · · , d T ) (cid:55)→ ˜ (cid:96) y ( d t , · · · , d T ) via˜ (cid:96) y ( d t , · · · , d T ) (cid:44) T (cid:88) τ = t E ( ψ t,τ,y ( D t , · · · , D τ − , d τ ) + φ t,τ,y ( D t , · · · , D τ − , d τ )) . Apply Lemma 11 to the second term on the RHS of (7.9) and substitute (7.11) into the ﬁrst termon the RHS of (7.9) to see √ n (cid:16) ˆ M rt,n ( (cid:98) S t ) − M rt,n ( (cid:98) S t ) (cid:17) = T − (cid:88) τ = t ˜˙ χ t,τ ( S t , s τ +1 ) · √ n ( (cid:98) s τ +1 − s τ +1 ) + G n ˜ (cid:96) (cid:98) S t + o P (1 + √ n ( (cid:98) S t − S t )) , where ˜ χ t,τ is deﬁned in (7.10). By the proof of Theorem 8, G n ˜ (cid:96) (cid:98) S t → P G n ˜ (cid:96) S t . By the inductionhypothesis, √ n ( (cid:98) s τ − s τ ) = G n ϕ sτ + o P (1) for τ = t + 1 , · · · , T. So, the above display implies √ n (cid:16) ˆ M rt,n ( (cid:98) S t ) − M rt,n ( (cid:98) S t ) (cid:17) = G n (cid:32) T − (cid:88) τ = t ˜˙ χ t,τ ( S t , s τ +1 ) ϕ sτ +1 + ˜ (cid:96) S t (cid:33) + o P (1 + √ n ( (cid:98) S t − S t )) . The above display, together with Lemma 2, implies that √ n M (cid:48)(cid:48) t ( S t )( (cid:98) S t − S t ) = − G n m rt,S t + √ n (cid:16) ˆ M rt ( (cid:98) S t ) − M rt,n ( (cid:98) S t ) (cid:17) + o P (1 + √ n ( (cid:98) S t − S t ))= G n (cid:32) − m rt,S t + T − (cid:88) τ = t ˜˙ χ t,τ ( S t , s τ +1 ) ϕ sτ +1 + ˜ (cid:96) S t (cid:33) + o P (1 + √ n ( (cid:98) S t − S t )) . -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC29

Use the CLT and compare the order of both sides to see √ n ( (cid:98) S t − S t ) = O P (1). Therefore, theasymptotic inﬂuence function of (cid:98) S t is ϕ St (cid:44) − m rt,S t + T − (cid:88) τ = t ˜˙ χ t,τ ( S t , s τ +1 ) ϕ sτ +1 + ˜ (cid:96) S t . (EC.2.55)Next, since √ n ( (cid:98) S t − S t ) = O P (1), Lemma 3 implies √ n M (cid:48) t ( s t )( (cid:98) s t − s t ) = G n ( − m t,s t + m t,S t + K t ) − √ n (cid:16) ˆ M t,n ( (cid:98) s t ) − M t,n ( (cid:98) s t ) (cid:17) + √ n (cid:16) ˆ M t ( (cid:98) S t ) − M t,n ( (cid:98) S t ) (cid:17) + o P ( √ n ( (cid:98) s t − s t ))= G n ( − m t,s t + m t,S t + K t − κ t,s t + κ t,S t ) + o P ( √ n ( (cid:98) s t − s t )) . Use the CLT and compare the order of both sides to see that √ n ( (cid:98) s t − s t ) = O P (1). Substitute thisinto the above display to conclude that ϕ st (cid:44) ( − m t,s t + m t,S t + K t − κ t,s t + κ t,S t ) / M (cid:48) t ( s t ) (EC.2.56)is the asymptotic inﬂuence function of (cid:98) s t .Based on (EC.2.53)-(EC.2.55) and the induction hypothesis, we have (cid:16) √ n ( (cid:98) s t − s t ) , √ n (cid:16) (cid:98) S t − S t (cid:17) , · · · , √ n ( (cid:98) s T − s T ) , √ n (cid:16) (cid:98) S T − S T (cid:17)(cid:17) = (cid:0) G n ϕ st , G n ϕ St , · · · , G n ϕ sT , G n ϕ ST (cid:1) + o P (1) . Repeating the above induction argument establishes that (cid:16) √ n ( (cid:98) s − s ) , √ n (cid:16) (cid:98) S − S (cid:17) , · · · , √ n ( (cid:98) s T − s T ) , √ n (cid:16) (cid:98) S T − S T (cid:17)(cid:17) = (cid:0) G n ϕ s , G n ϕ S , · · · , G n ϕ sT , G n ϕ ST (cid:1) + o P (1) . With the above inﬂuence functions, the joint normality is a direct application of the multivariateCLT, together with Slutsky’s lemma.

EC.3. Additional technical proofs

EC.3.1. Proof of Lemma EC.1

Proof of (i) . Induction is used to show that there exists a constant L t such that for all d, y , y ∈ Θ, | m t,y ( d ) − m t,y ( d ) | ≤ L t | y − y | . (EC.3.1)First, consider t = T with m T,y ( d ) = b T ( d − y ) + + h T ( y − d ) + . It is obvious that (EC.3.1) holds with L T = b T ∨ h T . Next, suppose (EC.3.1) holds for periods t + 1 , t + 2 , · · · , T . In period t , we have m t,y ( d ) − m t,y ( d ) = C t ( y , d ) − C t ( y , d ) + M t +1 ( s t +1 ) [ I ( y − d < s t +1 ) − I ( y − d < s t +1 )]+ M t +1 ( y − d ) I ( y − d ≥ s t +1 ) − M t +1 ( y − d ) I ( y − d ≥ s t +1 ) . C30 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

By the induction hypothesis, m t +1 ,y ( d ) is L t +1 -Lipschitz in y , and so M t +1 ( y ) = E m t +1 ,y ( D t +1 ) is L t +1 -Lipschitz by Jensen’s inequality. WLOG, assume y < y and consider the following cases: • Case 1: y − d ≤ y − d < s t +1 . In this case | m t,y ( d ) − m t,y ( d ) | = | C t ( y , d ) − C t ( y , d ) | ≤ ( b t ∨ h t ) | y − y | . • Case 2: y − d < s t +1 ≤ y − d . In this case, | m t,y ( d ) − m t,y ( d ) | = | C t ( y , d ) − C t ( y , d ) + M t +1 ( s t +1 ) − M t +1 ( y − d ) |≤ ( b t ∨ h t ) | y − y | + L t +1 ( y − d − s t +1 ) ≤ ( b t ∨ h t ) | y − y | + L t +1 ( y − d − y + d )= ( b t ∨ h t + L t +1 ) | y − y | , where the ﬁrst inequality is by Lipschitz continuity of M t +1 , and the second is due to y − d ≤ s t +1 . • Case 3: s t +1 ≤ y − d ≤ y − d. In this case, | m t,y ( d ) − m t,y ( d ) | = | C t ( y , d ) − C t ( y , d ) + M t +1 ( y − d ) − M t +1 ( y − d ) | ≤ ( b t ∨ h t + L t +1 ) | y − y | .By combining all three cases, we see the desired result holds for period t with L t = b t ∨ h t + L t +1 . Proof of (ii) . Based on the ﬁrst part, y (cid:55)→ m t,y ( d ) is L t -Lipschitz for every d , and y has compactsupport Θ by Assumption 1. So M t is P -Donsker by Example 19.7 of van der Vaart (1998). EC.3.2. Proof of Lemma EC.4

Start with the ﬁrst inequality and WLOG assume g ( y ) < g ( y ) . By deﬁnition we have | h ( y , D ) I { D ≤ g ( y ) } − h ( y , D ) I { D ≤ g ( y ) }| = | ( h ( y , D ) − h ( y , D )) I { D ≤ g ( y ) } − h ( y , D ) I { g ( y ) < D ≤ g ( y ) }| . (EC.3.2)Apply the inequality | a − b | ≤ | a | + | b | to the RHS, use the assumption that y (cid:55)→ h ( y, d ) is L h -Lipschitz continuous and bounded by ¯ h , and then take expectation on both sides to see E | h ( y , D ) I { D ≤ g ( y ) } − h ( y , D ) I { D ≤ g ( y ) }| ≤ L h | y − y | + ¯ h EI { g ( y ) < D ≤ g ( y ) } . Therefore, the ﬁrst inequality holds because the density of D is bounded by ¯ f , and g is L g -Lipschitz.Next consider the second inequality. Square both sides of (EC.3.2), take expectation on bothsides, and then use the fact that I { g ( y ) < D ≤ g ( y ) } I { D ≤ g ( y ) } = 0 to see that E | h ( y , D ) I { D ≤ g ( y ) } − h ( y , D ) I { D ≤ g ( y ) }| ≤ E | h ( y , D ) − h ( y , D ) | I { D ≤ g ( y ) } + E h ( y , D ) I { g ( y ) < D ≤ g ( y ) }≤ L h | y − y | + ¯ h EI { g ( y ) < D ≤ g ( y ) } , where the second inequality is by the assumption of ¯ h -boundedness of y (cid:55)→ h ( y, d ) and the assump-tions of this lemma. The expectation of the last line is bounded by ¯ f L g by assumption on D , hencethe lemma holds. -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC31

EC.3.3. Proof of Lemma EC.2

Proof of (i) . The proof is by induction starting with t = T . The deﬁnition of M (cid:48) T in (5.3) impliesthat M (cid:48) T ( y ) = ( b T + h T ) F T ( y ) − b T . We know F T is Lipschitz with constant f T (cid:44) sup y f T ( y ) byAssumption 1. Thus, the lemma holds for ˜ B T = ( b T ∨ h T ) and ˜ L T = ( b T + h T ) f T .Next, suppose the lemma holds for periods T, T − , · · · , t + 1 and consider period t . From (5.3), M (cid:48) t ( y ) = ( b t + h t ) F t ( y ) − b t + EM (cid:48) t +1 ( y − D t ) I ( y − D t ≥ s t +1 ) . It is readily seen that | M (cid:48) t ( y ) | ≤ b t ∨ h t + ˜ B t +1 by the induction hypothesis and the fact F t ( y ) ≤ M (cid:48) t +1 is bounded and ˜ L t +1 -Lipschitz and the fact that F t is f t -Lipschitz, we can use Lemma EC.4 to see that the second term of the above display is Lipschitzwith constant ˜ L t +1 + ˜ B t +1 f t . Since y (cid:55)→ ( b t + h t ) F t ( y ) − b t is Lipschitz with constant ( b t + h t ) f t , thelemma holds for period t with ˜ B t = b t ∨ h t + ˜ B t +1 and ˜ L t = ˜ L t +1 + (cid:16) ˜ B t +1 + b t + h t (cid:17) f t . Proof of (ii) . The proof is again based on Lemma EC.4. By the Cauchy-Schwartz inequality, E ( m rt,y − m rt,y ) ≤ b t + h t ) E ( I ( D t ≤ y ) − I ( D t ≤ y )) + 2 E (cid:0) M (cid:48) t +1 ( y − D t ) I ( y − D t ≥ s t +1 ) − M (cid:48) t +1 ( y − D t ) I ( y − D t ≥ s t +1 ) (cid:1) = 2( b t + h t ) EI ( y ∧ y ≤ D t ≤ y ∨ y ) + 2 E (cid:110) M (cid:48) t +1 ( y − D t ) I ( y − D t ≥ s t +1 ) − M (cid:48) t +1 ( y − D t ) I ( y − D t ≥ s t +1 ) (cid:111) . (EC.3.3)The ﬁrst term on the RHS above is bounded by 2( b t + h t ) f t | y − y | because F t is f t -Lipchitz.The second term is bounded by ˜ L t +1 | y − y | + ˜ B t +1 f t | y − y | by Lemma EC.4. Thus, E ( m rt,y − m rt,y ) ≤ b t + h t ) f t | y − y | + ˜ L t +1 | y − y | + ˜ B t +1 f t | y − y | = (cid:16) b t + h t ) + ˜ B t +1 f t (cid:17) | y − y | + ˜ L t +1 | y − y | . (EC.3.4)The lemma holds for period t by choosing c t, = 2( b t + h t ) + ˜ B t +1 f t and c t, = ˜ L t +1 . Proof of (iii) . Since y (cid:55)→ M (cid:48) t ( y ) is L (cid:48) t -Lipschitz, the class F t = { d (cid:55)→ M (cid:48) t +1 ( y − d ) : y ∈ Θ } is P -Donsker by Example 19.7 of van der Vaart (1998). Since M (cid:48) t is bounded by B t from Part (i), theclass F t = { d (cid:55)→ M (cid:48) t +1 ( y − d ) I ( y − d ≥ s t +1 ) : y ∈ Θ } is also P -Donsker by Example 19.20 of van derVaart (1998). Next, the class F t = { d (cid:55)→ ( b t + h t ) I ( d ≤ y ) : y ∈ R } is trivially P -Donsker. By Example19.20 of van der Vaart (1998) again, M rt as a subset of F t + F t is P -Donsker. Proof of (iv) . We know M rt is P -Donsker for all t ∈ T . Thus (cid:12)(cid:12) M rt,n ( y ) − M (cid:48) t ( y ) (cid:12)(cid:12) → P

0, uniformly in y . It suﬃces to show (cid:12)(cid:12)(cid:12) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:12)(cid:12)(cid:12) → P t = T , ˆ M rt,n ( y ) = M rt,n ( y ) and the statement holds trivially. C32 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

Next, suppose (cid:12)(cid:12)(cid:12) ˆ M rτ,n ( y ) − M rτ,n ( y ) (cid:12)(cid:12)(cid:12) → P τ = T, T − , · · · , t + 1, thenit suﬃces to prove that (cid:12)(cid:12)(cid:12) ˆ M rt,n ( y ) − M rt,n ( y ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 ˆ M rt +1 ,n ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) − n n (cid:88) i =1 M (cid:48) t +1 ,n ( y − d it ) I ( y − d it ≥ s t +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) → P , uniformly in y . Since the result holds for t + 1, we know ˆ M rt +1 ,n ( y ) converges to M (cid:48) t +1 ( y ) in proba-bility, uniformly in y . Therefore, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 ˆ M rt +1 ,n ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) − n n (cid:88) i =1 M (cid:48) t +1 ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup y (cid:12)(cid:12)(cid:12) ˆ M rt +1 ,n ( y ) − M (cid:48) t +1 ( y ) (cid:12)(cid:12)(cid:12) , which is o P (1). On the other hand, since M (cid:48) t +1 ( y ) is bounded by B t +1 , we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 M (cid:48) t +1 ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) − n n (cid:88) i =1 M (cid:48) t +1 ( y − d it ) I ( y − d it ≥ s t +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ˜ B t +1 n n (cid:88) i =1 I ( y − (cid:98) s t +1 ∨ s t +1 ≤ d it ≤ y − (cid:98) s t +1 ∧ s t +1 ) . The RHS of the above display can be bounded in the same way as (EC.2.1), which converges tozero in probability uniformly. Combine all results above and use the triangle inequality to see (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 ˆ M rt +1 ,n ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) − n n (cid:88) i =1 M (cid:48) t +1 ( y − d it ) I ( y − d it ≥ s t +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 ˆ M rt +1 ,n ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) − n n (cid:88) i =1 M (cid:48) t +1 ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) i =1 M (cid:48) t +1 ( y − d it ) I ( y − d it ≥ (cid:98) s t +1 ) − n n (cid:88) i =1 M (cid:48) t +1 ( y − d it ) I ( y − d it ≥ s t +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) → P , uniformly in y . Thus, (cid:12)(cid:12)(cid:12) ˆ M rt,n ( y ) − M (cid:48) t ( y ) (cid:12)(cid:12)(cid:12) → P

0, uniformly in y , which concludes the lemma. EC.3.4. Proof of Lemma EC.5

The proof is by induction. Recall the deﬁnition of f t,τ,y,x in (EC.1.4). It is immediate that if (EC.1.5)holds for a pair ( t, τ ), then it also holds for a pair ( t (cid:48) , τ (cid:48) ) such that τ (cid:48) − t (cid:48) = τ − t . In this view, itsuﬃces to do induction based on τ − t. First consider τ − t = 1. According to the assumptions of thelemma, { f t,τ,y,x ( d t , d τ ) : y, x ∈ Θ } is a uniformly bounded Euclidean class. On the other hand, theclass of constants { E f t,τ,y,x : x, y ∈ Θ } is trivially a VC-subgraph and thus is Euclidean. Accordingto Corollary 17 of Nolan and Pollard (1987), { f t,τ,y,x ( d t , d τ ) − E f t,τ,y,x : x, y ∈ Θ } as a subset of thediﬀerence of two uniformly bounded Euclidean classes are also Euclidean with bounded envelope.Then we can apply Corollary 21 in Nolan and Pollard (1987) to see that the following two classesof functions are Euclidean: { E f t,τ,y,x ( D t , d τ ) − E f t,τ,y,x : x, y ∈ Θ } and { E f t,τ,y,x ( d t , D τ ) − E f t,τ,y,x : x, y ∈ Θ } . -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC33

Consider the function class ˜ F (cid:44) { ( d t , d τ ) (cid:55)→ ˜ f y,x ( d t , d τ ) : x, y ∈ Θ } where ˜ f y,x ( d t , d τ ) (cid:44) f t,τ,y,x ( d t , d τ ) − E f t,τ,y,x − ( E f t,τ,y,x ( d t , D τ ) − E f t,τ,y,x ) − ( E f t,τ,y,x ( D t , d τ ) − E f t,τ,y,x ). Then ˜ F as a subset of the sum-mation of three uniformly bounded Euclidean classes is a Euclidean class by Corollary 17 ofNolan and Pollard (1987). By routine calculations, we can show E ˜ f ( D t , d τ ) = E ˜ f ( d t , D τ ) = 0 for all d t , d τ ∈ Θ. In other words, ˜ F is a degenerate Euclidean class. This allows us to use Theorem 2.5 ofNeumeyer (2004) to conclude1 n √ n (cid:88) i t ,i τ (cid:0) f t,τ,y,x ( d i t t , d i τ τ ) − E f t,τ,y,x (cid:1) = 1 √ n n (cid:88) i =1 (cid:0) E f t,τ,y,x ( d it , D τ ) − E f t,τ,y,x (cid:1) + 1 √ n n (cid:88) i =1 (cid:0) E f t,τ,y,x ( D t , d iτ ) − E f t,τ,y,x (cid:1) + R n ( x, y ) , where sup x,y | R n ( x, y ) | = o P (1) . Thus, the lemma holds for τ − t = 1.Next, for a ﬁxed ˜ t such that 2 ≤ ˜ t < T −

1, suppose the lemma holds for τ − t =1 , , · · · , ˜ t −

1. Then for τ − t = ˜ t , (EC.1.5) holds for f t +1 ,τ,y,x by the induction hypothesis. Since k t +1 ,τ,y − d t ,x ( d t +1 , · · · , d τ ) = k t,τ,y,x ( d t , · · · , d τ ) by the way that K t,τ is constructed, we have f t,τ,y,x ( d t , · · · , d τ ) = f t +1 ,τ,y − d t ,x ( d t +1 , · · · , d τ ) I ( y − d t ≥ s t +1 ) (EC.3.5)by checking the deﬁnition of f t,τ,y,x in (EC.1.4). Deﬁne two multi-sample U -processes indexed by x, y ∈ Θ as U t,τ ( x, y ) (cid:44) n τ − t +1 / (cid:88) i t , ··· ,i τ f t,τ,y,x ( d i t t , · · · , d i τ τ ) , U t +1 ,τ ( x, y ) (cid:44) n τ − t − / (cid:88) i t +1 , ··· ,i τ f t +1 ,τ,y,x ( d i t +1 t +1 , · · · , d i τ τ ) . Because of (EC.3.5), these two processes are connected by the following relation: U t,τ ( x, y ) = 1 n n (cid:88) i t =1 U t +1 ,τ ( x, y − d i t t ) I ( y − d i t t ≥ s t +1 ) . (EC.3.6)The summand is zero when y − d i t t < s t +1 . So, we can focus on the summands with y − d i t t ≥ s t +1 ,i.e., s t +1 ≤ y − d i t t ≤ y , because the demand d i t t is non-negative. Because s t +1 , y ∈ Θ and Θ is aconvex closed set by construction, we have the inclusion y − d i t t ∈ Θ. This allows us to use theinduction hypothesis and write U t +1 ,τ ( x, y − d i t t ) = 1 √ n n (cid:88) i =1 τ (cid:88) ι = t +1 (cid:110) E f t +1 ,τ,y − d itt ,x ( D t +1 , · · · , D ι − , d iι , D ι +1 , · · · , D τ ) − E f t +1 ,τ,y − d itt ,x ( D t +1 , · · · , D τ ) (cid:9) + √ n E f t +1 ,τ,y − d itt ,x ( D t +1 , · · · , D τ ) + R n ( x, y − d i t t ) , where sup x,y ∈ Θ | R n ( x, y ) | = o P (1). Plug the above display into (EC.3.6) and use (EC.3.5) to see U t,τ ( x, y ) = 1 n √ n τ (cid:88) ι = t +1 (cid:88) i t ,i ι (cid:0) E f t,τ,y,x ( d i t t , D t +1 , · · · , D ι − , d i ι ι , D ι +1 , · · · , D t ) − E f t,τ,y,x ( d i t t , D t +1 , · · · , D τ ) (cid:1) + 1 √ n n (cid:88) i t =1 (cid:0) E f t,τ,y,x ( d i t t , D t +1 , · · · , D τ ) − E f t,τ,y,x (cid:1) + √ n E f t,τ,y,x + ˜ R n ( x, y ) , (EC.3.7) C34 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies where we add and subtract √ n E f t,τ,y,x above, and sup x,y | ˜ R n ( x, y ) | = o P (1) because˜ R n ( x, y ) = 1 n n (cid:88) i =1 R n ( x, y − d i t t ) I ( y − d i t t ≥ s t +1 ) ≤ sup u,v ∈ Θ | ˜ R n ( u, v ) |· n n (cid:88) i =1 I ( y − d i t t ≥ s t +1 ) ≤ sup u,v ∈ Θ | R n ( u, v ) | . Comparing (EC.3.7) with (EC.1.5), we need to further decompose the ﬁrst term on theRHS of (EC.3.7). For a given ι satisfying t + 1 ≤ ι ≤ τ , consider the function class G ι (cid:44) (cid:8) ( d t , d ι ) (cid:55)→ ˜ ψ x,y ( d t , d ι ) : x, y ∈ Θ (cid:9) where ˜ ψ x,y ( d t , d ι ) (cid:44) E f t,τ,y,x ( d t , · · · , D ι − , d ι , D ι +1 , · · · , D τ ) − E f t,τ,y,x ( d t , D t +1 , · · · , D τ ) . Since { f t,τ,y,x ( d i t t , · · · , d i τ τ ) : x, y ∈ Θ } is Euclidean by assumption, Corol-lary 21 of Nolan and Pollard (1987) implies that both { E f t,τ,y,x ( d t , D t +1 , · · · , D ι − , d ι , D ι +1 , · · · , D τ ) : x, y ∈ Θ } and { E f t,τ,y,x ( d t , D t +1 , · · · , D τ ) : x, y ∈ Θ } are also Euclidean. Since G ι is the diﬀerence oftwo Euclidean classes, it is Euclidean by Corollary 17 of Nolan and Pollard (1987). Then Corollary21 of Nolan and Pollard (1987) implies { E ˜ ψ x,y ( D t , d ι ) : x, y ∈ Θ } is a Euclidean class. It is thenreadily seen that { ˜ ψ x,y ( d t , d ι ) − E ˜ ψ x,y ( D t , d ι ) : x, y ∈ Θ } is a Euclidean class from Corollary 17 of Nolan and Pollard (1987). Since E ˜ ψ x,y ( d t , D ι ) =˜ ψ x,y ( D t , D ι ) = 0 for all d t ∈ Θ, the above class is also degenerate. This enables us to invoke Theorem2.5 of Neumeyer (2004) to arrive at the following decomposition:1 n √ n (cid:88) i t ,i ι E f t,τ,y,x ( d i t t , D t +1 , · · · , D ι − , d i ι ι , D ι +1 , · · · , D τ ) − E f t,τ,y,x ( d i t t , D t +1 , · · · , D τ )= 1 √ n n (cid:88) i =1 (cid:0) E f t,τ,y,x ( D t , · · · , D ι − , d iι , D ι +1 , · · · , D τ ) − E f t,τ,y,x (cid:1) + ˇ R n ( x, y ) , where sup x,y | ˇ R n ( x, y ) | = o P (1), uniformly in x, y . Combine the above display with (EC.3.7) to see U t,τ ( x, y ) = 1 √ n n (cid:88) i =1 τ (cid:88) ι = t +1 (cid:8) E f t,τ,y,x ( D t , · · · , D ι − , d iι , D ι +1 , · · · , D t ) − E f t,τ,y,x ( D t , · · · , D τ ) (cid:9) +1 √ n n (cid:88) i =1 (cid:110) E f t,τ,y,x ( d it , D t +1 , · · · , D τ ) − E f t,τ,y,x (cid:111) + √ n E f t,τ,y,x + ˜ R n ( x, y ) + ( τ − t ) ˇ R n ( x, y ) , where both ˜ R n ( x, y ) and ( τ − t ) ˇ R n ( x, y ) and o P (1), uniformly in x, y ∈ Θ. Move √ n E f t,τ,y,x to theLHS to conclude. EC.3.5. Variance estimators in Theorem 6

We ﬁrst show that ˆ E { ˆ ϕ S ( D , D ) } converges to E { ϕ S ( D , D ) } in probability, whereˆ E (cid:0) ˆ ϕ S ( D , D ) (cid:1) (cid:44) n n (cid:88) i =1

1( ˆ M (cid:48)(cid:48) ( (cid:98) S )) (cid:110) (cid:16) ˆ m r , (cid:98) S ( d i ) (cid:17) (cid:104) ˆ f ( (cid:98) S − (cid:98) s ) × + (cid:16) C ( (cid:98) S , d i ) − C ( (cid:98) s , d i ) + K (cid:17) − n (cid:88) l ˆ g (cid:98) S ( d l , d i ) (cid:105) (cid:111) , (EC.3.8) -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC35 ˆ M (cid:48)(cid:48) ( (cid:98) S ) = ( b + h ) ˆ f ( (cid:98) S ) + M r ,n ( (cid:98) s ) ˆ f ( (cid:98) S − (cid:98) s ) + 1 n n (cid:88) i =1 ( b + h ) ˆ f ( (cid:98) S − d i ) I ( (cid:98) S − d i ≥ (cid:98) s ) , ˆ g (cid:98) S ( d , d ) = ( b + h ) (cid:8) I ( (cid:98) S − d − d ≥ − ˆ F ( (cid:98) S − d ) (cid:9) I ( (cid:98) S − d ≥ (cid:98) s ) , ˆ F ( y ) = n (cid:80) ni =1 I ( d i ≤ y ), ˆ m r ,y is the right derivative of ˆ m ,y deﬁned in (2.8), and ˆ f and ˆ f are somekernel estimators speciﬁed later. Here, ˆ M (cid:48)(cid:48) ( (cid:98) S ) and ˆ g (cid:98) S are estimators of M (cid:48)(cid:48) ( S ) and g S in (6.3). BySilverman (1978), we can choose kernel estimators ˆ f and ˆ f such that sup y | ˆ f ( y ) − f ( y ) | = o P (1)and sup y | ˆ f ( y ) − f ( y ) | = o P (1). By Lemma EC.2, we know M r ,n ( (cid:98) s ) = M (cid:48) ( (cid:98) s ) + o P (1). Combinethese facts with the continuous mapping theorem to see f ( (cid:98) S − (cid:98) s ) = f ( S − s ) + o P (1), ˆ f ( (cid:98) S ) = f ( S ) + o P (1), and M r ,n ( (cid:98) s ) = M (cid:48) ( s ) + o P (1). So, we can rewrite ˆ M (cid:48)(cid:48) ( (cid:98) S ) asˆ M (cid:48)(cid:48) ( (cid:98) S ) = ( b + h ) f ( S ) + M (cid:48) ( s ) f ( S − s ) + 1 n n (cid:88) i =1 ( b + h ) f ( S − d i ) I ( (cid:98) S − d i ≥ (cid:98) s ) + o P (1) . Now that f is bounded, we can use similar arguments as in (EC.2.1) to replace the last term of theabove with n (cid:80) ni =1 ( b + h ) f ( S − d i ) I ( S − d i ≥ s ) + o P (1), which converges to E ( b + h ) f ( S − D ) I ( S − D ≥ s ) in probability by the weak law of large numbers. As a result,ˆ M (cid:48)(cid:48) ( (cid:98) S ) → P ( b + h ) f ( S ) + M (cid:48) ( s ) f ( S − s ) + E ( b + h ) f ( S − D ) I ( S − D ≥ s ) . (EC.3.9)Some routine calculations based on (5.3) show that the RHS is M (cid:48)(cid:48) ( S ). Now look at the numeratorof (EC.3.8). We ﬁrst use ˆ f ( (cid:98) S − (cid:98) s ) → P f ( S − s ) to see thatˆ f ( (cid:98) S − (cid:98) s ) (cid:16) C ( (cid:98) S , d i ) − C ( (cid:98) s , d i ) + K (cid:17) = f ( S − s ) (cid:16) C ( (cid:98) S , d i ) − C ( (cid:98) s , d i ) + K (cid:17) + o P (1) . (EC.3.10)In addition, we can argue as in Lemma EC.2 to see that { d (cid:55)→ ( b + h ) (cid:8) I ( y − d − d ≥ − F ( y − d ) (cid:9) I ( y − d ≥ y ) : d ∈ R , y ∈ Θ , y ∈ Θ } is P -Donsker and thus P -Glivenko-Cantelli.Subsequently, we have1 n (cid:88) l ˆ g (cid:98) S ( d l , d ) = 1 n (cid:88) l ( b + h ) (cid:8) I ( (cid:98) S − d l − d ≥ − F ( (cid:98) S − d l ) (cid:9) I ( (cid:98) S − d l ≥ (cid:98) s ) + o P (1)= E ( b + h ) (cid:8) I ( (cid:98) S − D − d ≥ − F ( (cid:98) S − D ) (cid:9) I ( (cid:98) S − D ≥ (cid:98) s ) + R n ( d )= E g (cid:98) S ( D , d ) + R n ( d ) , (EC.3.11)where sup d | R n ( d ) | = o P (1). The ﬁrst line uses sup y | ˆ F ( y ) − F ( y ) | = o P (1), and the second lineuses the property of P -Glivenko-Cantelli classes. Combine (EC.3.9)-(EC.3.11) to seeˆ E (cid:0) ˆ ϕ S ( D , D ) (cid:1) = 1 n n (cid:88) i =1 M (cid:48)(cid:48) ( S )) (cid:110) (cid:16) ˆ m r , (cid:98) S ( d i ) (cid:17) + (cid:104) f ( S − s ) (cid:16) C ( (cid:98) S , d i ) − C ( (cid:98) s , d i ) + K (cid:17) − g (2) (cid:98) S ( d i ) (cid:105) (cid:111) + o P (1) . (EC.3.12) C36 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

For the ﬁrst term inside the curly bracket above, we can expand the square and argue as in Theorem1 (see (EC.2.1)) to conclude1 n n (cid:88) i =1 (cid:16) ˆ m r , (cid:98) S ( d i ) (cid:17) = 1 n n (cid:88) i =1 (cid:16) m r , (cid:98) S ( d i ) (cid:17) + o P (1) . (EC.3.13)By Lemma 5, Lemma EC.1, and Lemma EC.2, { d (cid:55)→ m r ,y ( d ) : y ∈ Θ } , { d (cid:55)→ C ( y, d ) : y ∈ Θ } and { d (cid:55)→ g (2) y ( d ) : y ∈ Θ } are all bounded and P -Donsker. Then { d (cid:55)→ f ( S − s ) ( C ( y , d ) − C ( y , d ) + K ) − g (2) y ( d ) : y , y ∈ Θ } is bounded and P -Donsker by Exam-ple 19.20 of van der Vaart (1998). Therefore, { d (cid:55)→ (cid:0) m r ,y ( d ) (cid:1) : y ∈ Θ } and { d (cid:55)→ (cid:2) f ( S − s ) ( C ( y , d ) − C ( y , d ) + K ) − g (2) y ( d ) (cid:3) : y , y ∈ Θ } are also P -Donsker by Example19.20 of van der Vaart (1998). Then we write (EC.3.12) asˆ E (cid:0) ˆ ϕ S ( D , D ) (cid:1) = 1 M (cid:48)(cid:48) ( S ) (cid:18) E (cid:16) m r , (cid:98) S ( D ) (cid:17) + E (cid:104) f ( S − s ) (cid:16) C ( (cid:98) S , D ) − C ( (cid:98) s , D ) + K (cid:17) − g (2) (cid:98) S ( D ) (cid:105) (cid:19) + o P (1)= 1 M (cid:48)(cid:48) ( S ) (cid:18) E (cid:0) m r ,S ( D ) (cid:1) + E (cid:104) f ( S − s ) ( C ( S , D ) − C ( s , D ) + K ) − g (2) S ( d ) (cid:105) (cid:19) + o P (1)= E ϕ ( D , D ) + o P (1) . The ﬁrst equality is by the property of P -Donsker classes, the second uses consistency of (cid:98) S , (cid:98) S , (cid:98) s ,and (cid:98) s , and the last uses independence of D and D . Similar to the analysis of ˆ E ( ˆ ϕ S ( D , D )) ,we can show that ˆ E ( ˆ ϕ s ( D , D )) converges to E ( ϕ s ( D , D )) in probability, whereˆ E (cid:8) ˆ ϕ s ( D ,D ) (cid:9) = 1 n n (cid:88) i =1 (cid:40)

1( ˆ M r ,n ( (cid:98) s )) (cid:16)(cid:16) ˆ F ( (cid:98) s − (cid:98) s ) − ˆ F ( (cid:98) S − (cid:98) s ) (cid:17) (cid:16) C ( (cid:98) S , d i ) − M ( (cid:98) S ) (cid:17) + 1 n (cid:88) l ˆ˜ g (cid:98) S ( d l , d i ) − n (cid:88) l ˆ˜ g (cid:98) s ( d l , d i ) (cid:17) + (cid:16) ˆ m , (cid:98) S ( d i ) − ˆ m , (cid:98) s ( d i ) + K (cid:17) (cid:41) . (EC.3.14)Here, ˆ˜ g (cid:98) S ( d , d ) = (cid:110) h ( y − d − d ) + + b ( d + d − y ) + − ˆ M ,n ( y − d ) (cid:111) I ( y − d ≥ (cid:98) s ) is an esti-mate of ˜ g S in (6.12), ˆ M ,n is deﬁned in (2.8), ˆ M r ,n is the right derivative of ˆ M ,n ( y ) = n (cid:80) ni =1 ˆ m ,y ( d i ) with ˆ m ,y ( d ) deﬁned in (2.8), and ˆ F ( y ) = n (cid:80) ni =1 I ( d i ≤ y ). As with the analysisfor ˆ E ( ˆ ϕ S ( D , D )) , we can ﬁrst show the convergence of the denominator on the RHS of the abovedisplay, and then the numerator. The details are similar to those of (EC.3.8), which we omit. EC.3.6. Proof of Theorem 7

The optimal expected cost is C ∗ = M ( S ) + K by (2.6), which is estimated by ˆ C ∗ = ˆ M ,n ( (cid:98) S ) + K .By adding and subtracting M ,n ( (cid:98) S ) and M ( (cid:98) S ), √ n ( ˆ C ∗ − C ∗ ) equals √ n (cid:16) ˆ M ,n ( (cid:98) S ) − M ( S ) (cid:17) = √ n (cid:16) ˆ M ,n ( (cid:98) S ) − M ,n ( (cid:98) S ) (cid:17) + √ n (cid:16) M ,n ( (cid:98) S ) − M ( (cid:98) S ) (cid:17) + √ n (cid:16) M ( (cid:98) S ) − M ( S ) (cid:17) . (EC.3.15) -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC37

The last term is o P (1) by Taylor’s expansion and the fact that M (cid:48) ( S ) = 0. The second term is G n m ,S which is an i.i.d. sum from the analysis in Lemma 3 (see (EC.2.4)). The ﬁrst term hasbeen analyzed in Theorem 5 and is also known to have an i.i.d. structure (see (EC.2.15)). Thisshows that M ,n ( (cid:98) S ) + K is asymptotically normal. Recall ˆ M ,n in (2.8) and M in (2.7). Explicitlywrite the optimal cost C ∗ = M ( S ) + K and the estimator ˆ C ∗ = ˆ M ,n ( (cid:98) S ) + K as follows: C ∗ = K + E C ( S , D ) + E [( C ( S , D ) + K ) I ( S − D < s ) + C ( S − D , D ) I ( S − D ≥ s )] , ˆ C ∗ = K + 1 n n (cid:88) i =1 C ( (cid:98) S , d i )+1 n n (cid:88) i =1 n (cid:88) j =1 (cid:104)(cid:16) C ( (cid:98) S , d j ) + K (cid:17) I ( (cid:98) S − d i < (cid:98) s ) + C ( (cid:98) S − d i , d j ) I ( (cid:98) S − d i ≥ (cid:98) s ) (cid:105) . (EC.3.16)From the deﬁnition of m ,y in (4.1) and the formulation in (EC.2.15) as well as the analysis above,we can rewrite √ n ( ˆ C ∗ − C ∗ ) = G n ( ρ (1) S ,S ,s + ρ (2) S ,S ,s ) from (EC.3.15) with ρ (1) S ,S ,s ( d ) = ( M ( S ) + K ) I ( S − d < s ) + M ( S − d ) I ( S − d ≥ s ) − E (cid:16) ( M ( S ) + K ) I ( S − D < s ) + M ( S − D ) I ( S − D ≥ s ) (cid:17) + C ( S , d ) − E C ( S , D ) , (EC.3.17) ρ (2) S ,S ,s ( d ) = { − F ( S − s ) } ( C ( S , d ) + K ) + E C ( S − D , d ) I ( S − D ≥ s ) − E (cid:16) ( M ( S ) + K ) I ( S − D < s ) + M ( S − D ) I ( S − D ≥ s ) (cid:17) . (EC.3.18)The above two functions capture the respective inﬂuence from D and D on the asymptoticdistribution of ˆ C ∗ . By the independence of D and D , var( ρ (1) S ,S ,s + ρ (2) S ,S ,s ) equals E ( ρ (1) S ,S ,s ) + E ( ρ (2) S ,S ,s ) , and can be estimated as1 n n (cid:88) i =1 (cid:2) ˆ ρ ( d i ) + ˆ ρ ( d i ) (cid:3) , (EC.3.19)where ˆ ρ (1) ( d ) =( ˆ M ,n ( (cid:98) S ) + K ) I ( (cid:98) S − d < (cid:98) s ) + ˆ M ,n ( (cid:98) S − d ) I ( (cid:98) S − d ≥ (cid:98) s ) − n n (cid:88) l =1 (cid:16) ( ˆ M ,n ( (cid:98) S ) + K ) I ( (cid:98) S − d l < (cid:98) s ) + ˆ M ,n ( (cid:98) S − d l ) I ( (cid:98) S − d l ≥ (cid:98) s ) (cid:17) + C ( (cid:98) S , d ) − n n (cid:88) l =1 C ( S , d l ) , ˆ ρ (2) ( d ) = { − ˆ F ( (cid:98) S − (cid:98) s ) } (cid:16) C ( (cid:98) S , d ) + K (cid:17) + 1 n n (cid:88) l =1 C ( (cid:98) S − d l , d ) I ( (cid:98) S − d l ≥ (cid:98) s ) − n n (cid:88) l =1 (cid:16) ( ˆ M ,n ( (cid:98) S ) + K ) I ( (cid:98) S − d l < (cid:98) s ) + ˆ M ,n ( (cid:98) S − d l ) I ( (cid:98) S − d l ≥ (cid:98) s ) (cid:17) . C38 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

Here, ˆ M ,n is deﬁned in (2.8), ˆ M r ,n is the right derivative of ˆ M ,n , ˆ g (cid:98) S ( d , d ) and ˆ M (cid:48)(cid:48) ( (cid:98) S ) aredeﬁned the same as in the proof of Theorem 6 (see (EC.3.8)), ˆ m r ,y ( d ) is deﬁned in (5.2), andˆ F ( y ) = n (cid:80) ni =1 I ( d i ≤ y ). The above ˆ ρ (1) and ˆ ρ (2) are used to approximate ρ (1) S ,S ,s and ρ (2) S ,S ,s ,respectively. To establish consistency of the variance estimator given in (EC.3.19), we ﬁrst considerthe second term n (cid:80) ni =1 ˆ ρ ( d i ) . Since { d (cid:55)→ I ( d ≤ y ) : y ∈ R } is P -Donsker, ˆ F ( (cid:98) S − (cid:98) s ) = F ( (cid:98) S − (cid:98) s ) + o P (1) = F ( S − s ) + o P (1). The continuous mapping theorem further implies ˆ F ( (cid:98) S − (cid:98) s ) = F ( (cid:98) S − (cid:98) s ) + o P (1) = F ( S − s ) + o P (1). On the other hand, the function class { d (cid:55)→ C ( y − d , d ) I ( y − d ≥ y ) : y ∈ Θ , y ∈ Θ , d ∈ Θ } is P -Donsker by Lemma EC.1 and Example 19.20 ofvan der Vaart (1998). This allows us to simplify the second term of ˆ ρ (2) ( d ) as1 n n (cid:88) l =1 C ( (cid:98) S − d l , d ) I ( (cid:98) S − d l ≥ (cid:98) s ) = E C ( (cid:98) S − D , d ) I ( (cid:98) S − D ≥ (cid:98) s ) + R n ( d ) , (EC.3.20)where sup d | R n ( d ) | = o P (1). For the third term of ˆ ρ (2) , use sup y ∈ Θ | ˆ M ,n ( y ) − M ( y ) | → P n n (cid:88) l =1 (cid:16) ( ˆ M ,n ( (cid:98) S ) + K ) I ( (cid:98) S − d l < (cid:98) s ) + ˆ M ,n ( (cid:98) S − d l ) I ( (cid:98) S − d l ≥ (cid:98) s ) (cid:17) = 1 n n (cid:88) l =1 (cid:16) ( M ( (cid:98) S ) + K ) I ( (cid:98) S − d l < (cid:98) s ) + M ( (cid:98) S − d l ) I ( (cid:98) S − d l ≥ (cid:98) s ) (cid:17) + o P (1) . (EC.3.21)Since M is Lipschitz by Lemma EC.1, we have { d (cid:55)→ ( M ( y ) + K ) I ( y − d < y ) + M ( y − d ) I ( y − d ≥ y ) : y ∈ Θ , y ∈ Θ , y ∈ Θ } is P -Donsker from Example 19.7 and 19.20 of van der Vaart (1998). Subsequently1 n n (cid:88) l =1 (cid:16) ( M ( (cid:98) S ) + K ) I ( (cid:98) S − d l < (cid:98) s ) + M ( (cid:98) S − d l ) I ( (cid:98) S − d l ≥ (cid:98) s ) (cid:17) = E (cid:16) ( M ( (cid:98) S ) + K ) I ( (cid:98) S − D < (cid:98) s ) + M ( (cid:98) S − D ) I ( (cid:98) S − D ≥ (cid:98) s ) (cid:17) + o P (1)= E (cid:16) ( M ( S ) + K ) I ( S − D < s ) + M ( S − D ) I ( S − D ≥ s ) (cid:17) + o P (1) , (EC.3.22)where the last step is by the continuous mapping theorem. Combining (EC.3.20)-(EC.3.22) showsˆ ρ (2) ( d ) = { − F ( S − s ) } (cid:16) C ( (cid:98) S , d ) + K (cid:17) + E C ( (cid:98) S − D , d ) I ( (cid:98) S − D ≥ (cid:98) s ) − E (cid:16) ( M ( S ) + K ) I ( S − D < s ) + M ( S − D ) I ( S − D ≥ s ) (cid:17) + R n ( d ) , (EC.3.23)where sup d | R n ( d ) | = o P (1). The function classes { d (cid:55)→ C ( y , d ) , y ∈ Θ } and { d (cid:55)→ E C ( y − D , d ) I ( y − D ≥ y ) : y ∈ Θ , y ∈ Θ } -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC39 are P -Donsker by Lemma EC.1 and Lemma EC.4, respectively. As a result, { d (cid:55)→ ρ (2) y ,y ,y ( d ) : y ∈ Θ , y ∈ Θ , y ∈ Θ } is also P -Donsker by Example 19.20 of van der Vaart (1998) where ρ (2) y ,y ,y is deﬁned in (EC.3.18). Since ρ (2) y ,y ,y ( d ) is uniformly bounded, { d (cid:55)→ ( ρ (2) y ,y ,y ( d )) : y ∈ Θ , y ∈ Θ , y ∈ Θ } is also P -Donsker by Example 19.20 of van der Vaart (1998). Then (EC.3.23) implies1 n n (cid:88) i =1 (cid:0) ˆ ρ (2) ( d i ) (cid:1) = 1 n n (cid:88) i =1 (cid:16) ρ (2) (cid:98) S , (cid:98) S , (cid:98) s ( d i ) (cid:17) + o P (1)= E (cid:16) ρ (2) (cid:98) S , (cid:98) S , (cid:98) s ( D ) (cid:17) + o P (1)= E (cid:16) ρ (2) S ,S ,s ( D ) (cid:17) + o P (1) , where the third line is by the continuous mapping theorem.Repeat the above argument to show n (cid:80) ni =1 (cid:0) ˆ ρ (1) ( d i ) (cid:1) = E (cid:16) ρ (1) S ,S ,s ( D ) (cid:17) + o P (1). Combinethe results to see that n (cid:80) ni =1 [ ˆ ρ ( d i ) + ˆ ρ ( d i )] converges to var( ρ (1) S ,S ,s + ρ (2) S ,S ,s ) in probability. EC.3.7. Proof of Corollary 1

By Theorem 9, √ n ( (cid:98) S − S ) = G n ϕ S + o P (1) and √ n ( (cid:98) S − S ) = G n ϕ S + o P (1). Combine these tworesults to see √ n (cid:16) (cid:98) S − (cid:98) S − ( ϕ S − ϕ S ) (cid:17) = G n ( ϕ S − ϕ S ) + o P (1). From Theorem 9, we have ϕ S ( d , d ) = − f ( S ) (cid:18) I ( d ≤ S ) − b b + h (cid:19) , ϕ S ( d , d ) = − M (cid:48)(cid:48) ( S ) (cid:16) m r ,S ( d ) + g (2) S ( d ) (cid:17) , where m r ,y ( d ) = ( b + h ) I ( d ≤ y ) − b + M (cid:48) ( y − d ) I ( y − d ≥ S ) ,g (2) S ( d ) = E ( b + h ) (cid:8) I ( S − D − d ≥ − F ( S − D ) (cid:9) I ( S − D ≥ S ) . (EC.3.24)Then the asymptotic variance of √ n ( (cid:98) S − (cid:98) S ) can be computed as E (cid:0) ϕ S − ϕ S (cid:1) = E (cid:18) m r ,S ( D ) M (cid:48)(cid:48) ( S ) (cid:19) + E (cid:32) − g (2) S ( D ) M (cid:48)(cid:48) ( S ) + I ( D ≤ S ) − b b + h f ( S ) (cid:33) . The remaining task is to show that ˆ E ( ˆ ϕ S − ˆ ϕ S ) converges to E ( ϕ S − ϕ S ) in probability, whereˆ E ( ˆ ϕ S − ˆ ϕ S ) = 1 n n (cid:88) i =1 (cid:32) ˆ m r , (cid:98) S ( d i )ˆ M (cid:48)(cid:48) ( (cid:98) S ) (cid:33) + 1 n n (cid:88) i =1  − ˆ g (2) (cid:98) S ( d i )ˆ M (cid:48)(cid:48) ( (cid:98) S ) + I ( d i ≤ (cid:98) S ) − b b + h ˆ f ( (cid:98) S )  , (EC.3.25)ˆ m r ,y ( d ) = ( b + h ) I ( d ≤ y ) − b + ˆ M r ,n ( y − d ) I ( y − d ≥ (cid:98) S ), ˆ M (cid:48)(cid:48) ( (cid:98) S ) = ( b + h ) ˆ f ( (cid:98) S ) + n (cid:80) ni =1 ( b + h ) ˆ f ( (cid:98) S − d i ≥ (cid:98) S ) , and ˆ g (2) (cid:98) S ( d ) = n (cid:80) nl =1 ( b + h ) (cid:8) I ( (cid:98) S − d l − d ≥ − ˆ F ( (cid:98) S − d l ) (cid:9) I ( (cid:98) S − d l ≥ (cid:98) S ) , with ˆ F ( y ) = n (cid:80) ni =1 I ( d i ≤ y ). Here, ˆ g (2) (cid:98) S is an estimator of g (2) S in (EC.3.24).To show the consistency of this variance estimator, we ﬁrst consider the second sum of (EC.3.25). C40 e-companion to

Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

By (EC.3.9), ˆ M (cid:48)(cid:48) ( (cid:98) S ) converges to M (cid:48)(cid:48) ( S ). On the other hand, we can choose kernel estimators ˆ f and ˆ f such that sup y | ˆ f ( y ) − f ( y ) | = o P (1) and sup y | ˆ f ( y ) − f ( y ) | = o P (1). Use this condition andthe continuous mapping theorem to see that ˆ f ( (cid:98) S ) → P f ( S ) . Since ˆ F → P F uniformly, argue asin (EC.3.11) to seeˆ g (2) (cid:98) S ( d ) = 1 n n (cid:88) l =1 ( b + h ) (cid:8) I ( (cid:98) S − d l − d ≥ − F ( (cid:98) S − d l ) (cid:9) I ( (cid:98) S − d l ≥ (cid:98) S ) + R n ( d )= E g (cid:98) S ( D , d ) + R n ( d ) = g (2) (cid:98) S ( d ) + R n ( d ) , where sup d | R n ( d ) | = o P (1) and g (2) (cid:98) S is deﬁned in (EC.3.24). Use the above result, the fact thatˆ M (cid:48)(cid:48) ( (cid:98) S ) → P M (cid:48)(cid:48) ( S ), and the fact that ˆ f ( (cid:98) S ) → P f ( S ) to see { ˆ f ( (cid:98) S ) } − (cid:26) I ( d ≤ (cid:98) S ) − b b + h (cid:27) − ˆ g (2) (cid:98) S ( d )ˆ M (cid:48)(cid:48) ( (cid:98) S ) = { f ( S ) } − (cid:26) I ( d ≤ (cid:98) S ) − b b + h (cid:27) − g (2) (cid:98) S ( d ) M (cid:48)(cid:48) ( S ) + R n ( d ) , where sup d | R n ( d ) | = o P (1). By the equality a − b = ( a + b )( a − b ) and the fact that { f ( S ) } − (cid:110) I ( d ≤ (cid:98) S ) − b b + h (cid:111) − { M (cid:48)(cid:48) ( S ) } − g (2) (cid:98) S ( d ) is uniformly bounded, we have1 n n (cid:88) i =1  I ( d i ≤ (cid:98) S ) − b b + h ˆ f ( (cid:98) S ) − ˆ g (2) (cid:98) S ( d i )ˆ M (cid:48)(cid:48) ( (cid:98) S )  = 1 n n (cid:88) i =1  I ( d i ≤ (cid:98) S ) − b b + h f ( S ) − g (2) (cid:98) S ( d i ) M (cid:48)(cid:48) ( S )  + o P (1) . (EC.3.26)The function class { d (cid:55)→ − g (2) y ( d ) M (cid:48)(cid:48) ( S ) + I ( d ≤ y ) − b b h f ( S ) : y ∈ Θ } is P -Donsker by Lemma 5 and Example19.20 of van der Vaart (1998). Since it is also bounded, we know { d (cid:55)→ (cid:18) − g (2) y ( d ) M (cid:48)(cid:48) ( S ) + I ( d ≤ y ) − b b h f ( S ) (cid:19) : y ∈ Θ } is also P -Donsker by Example 19.20 of van der Vaart (1998). Then we write (EC.3.26) as1 n n (cid:88) i =1 (cid:20) − ˆ g (2) (cid:98) S ( d i )ˆ M (cid:48)(cid:48) ( (cid:98) S ) + I ( d i ≤ (cid:98) S ) − b b + h ˆ f ( (cid:98) S ) (cid:21) = 1 n n (cid:88) i =1  − g (2) (cid:98) S ( d i ) M (cid:48)(cid:48) ( S ) + I ( d i ≤ (cid:98) S ) − b b + h f ( S )  + o P (1)= E  − g (2) (cid:98) S ( D ) M (cid:48)(cid:48) ( S ) + I ( D ≤ (cid:98) S ) − b b + h f ( S )  + o P (1)= E (cid:32) − g (2) S ( D ) M (cid:48)(cid:48) ( S ) + I ( D ≤ S ) − b b + h f ( S ) (cid:33) + o P (1) , (EC.3.27)where the second equality is by the property of P -Donsker classes and the last equality is by thecontinuous mapping theorem. Next, consider the ﬁrst term on RHS of (EC.3.25) and observe that1 n n (cid:88) i =1 (cid:32) ˆ m r , (cid:98) S ( d i )ˆ M (cid:48)(cid:48) ( (cid:98) S ) (cid:33) = 1 n n (cid:88) i =1 (cid:32) ˆ m r , (cid:98) S ( d i ) M (cid:48)(cid:48) ( S ) (cid:33) + o P (1) , -companion to Xun Zhang, Zhisheng, Ye, and William Haskell:

Data-driven inventory policies

EC41 because ˆ M (cid:48)(cid:48) ( (cid:98) S ) → P M (cid:48)(cid:48) ( S ) and ˆ m ,y is uniformly bounded. Similar to (EC.3.13), the ﬁrst termon the RHS above can be simpliﬁed as1 n n (cid:88) i =1 (cid:32) ˆ m r , (cid:98) S ( d i ) M (cid:48)(cid:48) ( S ) (cid:33) = 1 n n (cid:88) i =1 (cid:32) m r , (cid:98) S ( d i ) M (cid:48)(cid:48) ( S ) (cid:33) + o P (1) . Because the function class { d (cid:55)→ m r ,y ( d ) : y ∈ Θ } is a uniformly bounded P -Donsker class by LemmaEC.2, { d (cid:55)→ (cid:0) m r ,y ( d ) (cid:1) : y ∈ Θ } is also P -Donsker by Example 19.20 of van der Vaart (1998). Usethis fact, and combine the above two displays to see1 n n (cid:88) i =1 (cid:32) ˆ m r , (cid:98) S ( d i )ˆ M (cid:48)(cid:48) ( S ) (cid:33) = E (cid:32) m r , (cid:98) S ( D ) M (cid:48)(cid:48) ( S ) (cid:33) + o P (1) = E (cid:18) m r ,S ( D ) M (cid:48)(cid:48) ( S ) (cid:19) + o P (1) . (EC.3.28)Here, the second equality is by the property of P -Donsker classes and the last equality is by thecontinuous mapping theorem. Finally, combine (EC.3.27) and (EC.3.28) to seeˆ E ( ˆ ϕ S − ˆ ϕ S ) = 1 n n (cid:88) i =1 (cid:32) ˆ m r , (cid:98) S ( d i )ˆ M (cid:48)(cid:48) ( (cid:98) S ) (cid:33) + 1 n n (cid:88) i =1  − ˆ g (2) (cid:98) S ( d i )ˆ M (cid:48)(cid:48) ( (cid:98) S ) + I ( d i ≤ (cid:98) S ) − b b + h ˆ f ( (cid:98) S )  = E (cid:18) m r ,S ( D ) M (cid:48)(cid:48) ( S ) (cid:19) + E (cid:32) − g (2) S ( D ) M (cid:48)(cid:48) ( S ) + I ( D ≤ S ) − b b + h f ( S ) (cid:33) + o P (1)= E ( ϕ S − ϕ S ) + o P (1)(1)