[PDF] Robust Arbitrage Conditions for Financial Markets

Abstract

This paper investigates arbitrage properties of financial markets under distributional uncertainty using Wasserstein distance as the ambiguity measure. The weak and strong forms of the classical arbitrage conditions are considered. A relaxation is introduced for which we coin the term statistical arbitrage. The simpler dual formulations of the robust arbitrage conditions are derived. A number of interesting questions arise in this context. One question is: can we compute a critical Wasserstein radius beyond which an arbitrage opportunity exists? What is the shape of the curve mapping the degree of ambiguity to statistical arbitrage levels? Other questions arise regarding the structure of best (worst) case distributions and optimal portfolios. Towards answering these questions, some theory is developed and computational experiments are conducted for specific problem instances. Finally some open questions and suggestions for future research are discussed.

Full PDF

RRobust Arbitrage Conditions for Financial Markets

Derek Singh, Shuzhong Zhang

Department of Industrial and Systems Engineering, University of [email protected], [email protected]

Abstract

This paper investigates arbitrage properties of ﬁnancial markets under distributional uncertainty using Wasser-stein distance as the ambiguity measure. The weak and strong forms of the classical arbitrage conditions are con-sidered. A relaxation is introduced for which we coin the term statistical arbitrage. The simpler dual formulationsof the robust arbitrage conditions are derived. A number of interesting questions arise in this context. One ques-tion is: can we compute a critical Wasserstein radius beyond which an arbitrage opportunity exists? What is theshape of the curve mapping the degree of ambiguity to statistical arbitrage levels? Other questions arise regardingthe structure of best (worst) case distributions and optimal portfolios. Towards answering these questions, sometheory is developed and computational experiments are conducted for speciﬁc problem instances. Finally someopen questions and suggestions for future research are discussed.

Keywords— arbitrage, statistical arbitrage, Farkas lemma, robust optimization, Wasserstein distance, Lagrangian duality

Financial arbitrage with respect to securities pricing is a fundamental concept regarding the behavior of ﬁnancial marketsdeveloped by Ross in the 1970s. A couple of his seminal papers include

Return, Risk, and Arbitrage (Ross et al., 1973) and

The Arbitrage Theory of Capital Asset Pricing (Ross, 1976). In the author’s own words the arbitrage model or arbitrage pricingtheory (APT) was developed as an alternate approach to the (mean variance) Capital Asset Pricing Model (CAPM) (Sharpe,1964) which was itself an extension of the foundational work on Modern Portfolio Theory by Harry Markowitz (Markowitz,1952). Ross argued that APT imposed less restrictions on the capital markets as did CAPM such as its requirement that themarket be in equilibrium and its consideration of (only) a single market risk factor as measured by variance of asset returns.Recall that CAPM uses the security market line to relate the expected return on an asset to its beta or sensitivity to systematic(market) risk. APT, on the other hand, is a multi-factor cross sectional model that explains the expected return on an asset inlinear terms of betas to multiple market risk factors that capture systematic risk (Ross et al., 1973), (Ross, 1976).The motivating idea behind APT is the no-arbitrage principle as characterized by the no-arbitrage conditions. This principleasserts that in a securities market it should not be possible to construct a zero cost portfolio that guarantees per scenario eithera riskless proﬁt or no chance of losses, across all possible market scenarios. If this were the case, one would be able to makemoney from nothing, so to speak. Ross formulates the no-arbitrage conditions and via duality theory of linear programmingshows the equivalent existence of a state price vector to recover market prices (Ross et al., 1973). Existing results in theliterature (Delbaen and Schachermayer, 2006) have shown the equivalence between the single period and multi period no-arbitrage properties (on a ﬁnite probability space). To simplify the analysis, we focus on the discrete single period setting.As a further reﬁnement, the notions of weak and strong arbitrage were developed. A portfolio w ∈ R n of n market securitiesis designated a weak arbitrage opportunity if w · S ≤ ( w · S ≥ ) = and Pr ( w · S > ) > S and time 1 asset price vector S . Similarly, a portfolio w ∈ R n is designated a strong arbitrage opportunity if w · S < ( w · S ≥ ) =

1. In a discrete setting with s market states, given security price vector p ∈ R n and payoff matrix X ∈ R n × s , aweak arbitrage opportunity is a portfolio w ∈ R n that satistifes X (cid:62) w (cid:13) p (cid:62) w ≤

0. Similarly, a strong arbitrage opportunityis a portfolio w ∈ R n that satistifes X (cid:62) w ≥ p (cid:62) w <

0. Note there are cases of weak arbitrage portfolios which are notstrong arbitrage portfolios (cf. e.g. LeRoy and Werner, 2014).In a discrete setting, the well known Farkas Lemma can be used to characterize the property of (weak) strong arbitrage.The Farkas Lemma characterization says that security price vectors p exclude (weak) strong arbitrage iff given payoff matrix X (across all market scenarios) there exists a (strictly) positive solution q to p = Xq . The normalized state price vectors q ∗ s = q s / ∑ s q s become the set of discrete risk neutral probabilities that deﬁnes the measure Q (cf. e.g. LeRoy and Werner, 2014). a r X i v : . [ q -f i n . P M ] A p r he fundamental theorem of asset pricing (also: of arbitrage, of ﬁnance) equates the non-existence of arbitrage opportunities ina ﬁnancial market to the existence of a risk neutral (or martingale) probability measure Q which can be used to compute the fairmarket value of all assets. A ﬁnancial market is said to be complete if such a measure Q is unique (cf. e.g. F¨ollmer and Schied,2011). The unique measure Q is frequently used in mathematical ﬁnance and the pricing of derivative securities in particular,in both discrete time (Shreve, 2005) and continuous time settings (Shreve, 2004).In the context of distributional uncertainty, a natural question arises as to how to characterize the notion of arbitrage. Onewould presumably seek a balance of generality and practicality in developing a framework to study the arbitrage properties.Some structure is needed to develop intuition and understanding. On the other hand, too much structure could be restrictiveand limit useful degrees of freedom. The approach taken in this line of research is to start from the fundamental (weak andstrong) no-arbitrage conditions and investigate how the market model transitions from one of no-arbitrage to arbitrage or viceversa. Distributional uncertainty is characterized via the Wasserstein metric for a couple reasons. The Wasserstein metricis a (reasonably) well understood metric and a natural, intuitive way to compare two probability distributions using ideas oftransport cost. It is also a ﬂexible approach that encompasses parametric and non-parametric distributions of either discreteor continuous form. Furthermore, recent duality results and structural results on the worst case distributions could help usto understand and/or quantify the market model transitions as well as measure (in a relative sense) the degree of arbitrage orno-arbitrage inherent to a given market model.Logical reasoning dictates that it should be possible to distort a no-arbitrage measure into an arbitrage admissible measure.For a simple discrete example, consider a one-period binomial tree of stock prices where 0 < S d < + r < S u , p u + p d = p u > = ⇒ p d > Q measure into a P measure such that p d =

0, it is clear to see that a zero cost portfolio that is long the stock and short ariskless bond will make proﬁt with probability 1. So then, how “far” is this distorted measure P from the original no-arbitragemeasure Q ? Can we safeguard ourselves within a ball of (only) arbitrage-free probability measures Q (cid:48) of distance at most δ from the reference measure Q ? What is the structure of the worst case distributions and optimal portfolios within this ball? Isthere a critical radius δ ∗ for this ball of arbitrage-free measures beyond which an arbitrage admissible measure is sure to exist?Alternatively, suppose the reference measure Q admitted arbitrage. What is the nearest arbitrage-free measure to this measure?Is that minimal distance, call it δ ∗ g , computable? These questions are the motivation for the line of research conducted in thispaper. As mentioned above, this research uses the Wasserstein distance metric (cf. e.g. Villani, 2008). To the best of ourknowledge, this paper is the ﬁrst to investigate these notions under the Wasserstein metric and develop a mixture of theoreticaland computational answers to these questions.The contributions of this paper are as follows. Primal problem formulations for the classical and statistical arbitrage con-ditions (under distributional uncertainty using Wasserstein ambiguity) are developed. Using recent duality results (Gao andKleywegt, 2016), (Blanchet and Murthy, 2019), simpler dual formulations that only involve the reference arbitrage-free proba-bility measure are constructed and solved. The max-min and max-max dual problems are formulated as nonlinear programmingproblems (NLPs). The structure of the best (worst) case distributions is analyzed. A formal proof for the NP hardness of thedual no-arbitrage problem is also given. Using this theoretical machinery, the critical radii δ ∗ , the best (worst) case distribu-tions, and/or optimal portfolios are computed for a few speciﬁc problem instances involving real world ﬁnancial market data.The complementary problem to compute the minimal distance δ ∗ g to an arbitrage-free measure for a reference measure thatadmits arbitrage is formulated and solved. We make use of the fundamental theorem of asset pricing to do this (cf. e.g. LeRoyand Werner, 2014; F¨ollmer and Schied, 2011).An outline of this paper is as follows. Section 1 gives an overview of the ﬁnancial concepts of arbitrage and statistical arbi-trage as well as a literature review. Section 2 develops the main theoretical results to characterize arbitrage under distributionaluncertainty using Wasserstein distance. Section 3 extends this machinery to cover the notion of statistical arbitrage. Section4 presents applications of the theory developed in Sections 2 and 3. Section 5 gives formal proofs for the NP hardness of theno-arbitrage problem. Section 6 is a computational study of the arbitrage properties for a few speciﬁc problem instances andcomputes numerical solutions. Section 7 discusses conclusions and suggestions for further research. Statistical arbitrage denotes a class of data driven quantitative trading and algorithmic investment strategies, for a set ofsecurities, to exploit deviations in relative market prices from their “true” distributions. Classical notions of statistical arbitrageopportunites involve estimation and use of statistical time series models (such as cointegration or kalman ﬁlter) to describestructural properties of asset prices such as mean reversion, volatility, etc. and help identify temporal deviations in marketprices that present trading and/or investment opportunites before the market “reverts” to its equilibrium behavior (Focardi et al.,2016). One particular sub-class of such strategies that is prevalent in both the literature and industry practice is known as pairstrading. The canonical example here is the coke vs. pepsi trade where one identiﬁes a price dislocation and then simultaneouslyshorts the over-priced asset and buys the under-priced asset and waits for the relative prices to restore to equilibrium, and closesout the position, thus realizing a proﬁt for the arbitrageur (Krauss, 2017). ractitioners, such as investment banks and hedge funds, employ a wide array of professionals to work in multiple aspectsof this: such as trading systems design and technology support, data collection, model development, trade execution, riskmanagement, reporting, business development, and so on. The actual practice of statistical arbitrage typically involves amixture of art and science. The science component is reﬂected through the estimation and use of statistical time series modelsand incorporation of emerging trends in the academic literature and technology (for the practical aspects of trade execution andrisk management). The art component is reﬂected through incorporation of investment professionals knowledge, experience,and beliefs about ﬁnancial markets’ current state and future outlook (Lazzarino et al., 2018).Classical notions of statistical arbitrage “already” have an intrinsic notion of variability, hence their name. The motivationfor the line of research in this paper is to extend this notion to incorporate distributional uncertainty within the framework ofWasserstein distance and the corresponding duality results. In this sense, the objectives are analagous, with the topic of focusshifted from classical arbitrage to statistical arbitrage. The ﬁrst steps are to deﬁne notions of statistical arbitrage and robuststatistical arbitrage and characterize their meaning. A survey of the literature reveals that no universal deﬁnition of statisticalarbitrage currently exists (Lazzarino et al., 2018). With that in hand, next steps are to quantify the best case ( α bc ) and worstcase ( α wc ) levels of statistical arbitrage as a function of the degree of distributional uncertainty, as represented by the radius δ of the Wasserstein ball. A related, complementary, problem is how to ﬁnd the nearest probability measure (to the original,reference measure) that guards against statistical arbitrage of level α close to 1. In conducting the literature review for this research, not many references were found that have investigated the topic ofarbitrage under distributional uncertainty. From Section 1.1 above, one can see that considerable research has been done inacademic circles regarding the classical notions of arbitrage in ﬁnancial markets. Indeed, several academic papers and ﬁnancialtextbooks have been written that cover these topics from their origin in the 1970s until today. It was surprising to us, at least,to ﬁnd only a few papers that address and/or extend the classical notions of arbitrage under the presence of some form ofdistributional uncertainty. This subsection gives an overview of what we found in the academic literature.An earlier paper by Jeyakumar and Li (2011) took a Farkas Lemma approach to describe linear systems subject to datauncertainty in the form of bounded uncertainty sets. The authors develop a notion of a robust Farkas Lemma in terms of theclosure of a convex cone they call the robust characteristic cone. As an application of the lemma, they characterize robustsolutions of conic linear programs with data contained in closed convex uncertainty sets. Recently Dinh et al. (2017) appliedthe robust Farkas Lemma approach to characterize weakly minimal elements of multi-objective optimization problems withuncertain constraints. Note that weakly minimal elements correspond to the notion of optimal solution in the scalar (singletonvector) case. The authors remark that their results are consistent with existing literature in the scalar case.One seminal paper of note by Ostrovskii used the total variation (TV) metric to characterize a radius δ TV such that allprobability measures Q (cid:48) within this distance from a weak arbitrage-free reference measure Q are also weak arbitrage-free.The author remarks that δ TV can be interpreted as the minimal probability of success that a zero cost initial portfolio w ∈ R n achieves positive value w · S at time 1. The additional constraint on the selected portfolio w is that it must have a strictlypositive probability of proﬁt under the reference measure P . This allows δ TV > δ TV to the minimal probability of success is established viaproof by contradiction (Ostrovski, 2013). The bound appears to be tight although this result is not proven in the paper.The author remarks that the probability measures Q and Q (cid:48) could have different support and/or generate different probabilityspaces. Furthermore, Ostrovski describes the no-arbitrage conditions and computes the critical radius δ TV for a one-periodbinomial and trinomial tree respectively. The conditions for the one-period binomial tree are given in Section 1.1 above. Thecorresponding radius δ TV is min ( p u , p d ) . For the one-period trinomial tree, different conﬁgurations are possible. Let q d , q m , q u denote the one-period transition probabilities to the down, middle, and up nodes respectively. For the case S d < S m < + r < S u the trinomial tree would allow arbitrage iff q d = q m = q u =

0. In the ﬁrst case, the TV distance between the binomial andtrinomial trees would be max ( − p u , p d ) = max ( p d , p d ) = p d . In the second case it would be max ( p u , q m , | p d − q d | ) ≥ p u .Thus the trinomial model would be arbitrage-free if the TV distance to the binomial model were less than min ( p u , p d ) . Theother cases S d < S m = + r < S u and S d < + r < S m < S u can be handled similarly (Ostrovski, 2013). While these results aretractable it was not clear (to us) how to apply these results to develop a dual formulation to study the market model transitionsfrom no-arbitrage to arbitrage or vice versa. Furthermore, total variation distance has been described as a strong notion ofdistance in the academic literature. Given our motivation to avoid (strong) restrictions in our characterization of robust no-arbitrage markets, it would seem that a different notion of distance between probability measures might be more appropriate.A recent paper by Bartl et al. (2017) explicitly incorporates a no-arbitrage constraint directly into the worst case Europeancall option pricing problem under Wasserstein ambiguity. We consider this problem from a different perspective in this paper,namely we restrict the Wasserstein ball of probability measures to implicitly consider only those measures which are arbitrage-free without the need to enforce an explicit constraint. In Section 2, the theoretical machinery to compute a critical radius δ ∗ w ( s ) is developed to pursue this approach. Simpler worst case option pricing formulas (that omit the explicit no-arbitrage constraint) re derived as well.Finally, another recent paper by the same author (Bartl et al., 2019) investigates the robust exponential utility maximizationproblem in a discrete time setting. The worst case expected utility is maximized under a family of probabilistic models ofendowment that satisfy no-arbitrage conditions by assumption . The authors show that an optimal trading strategy exists andthey provide a dual representation for the primal optimization problem. Furthermore, the optimal value is shown to converge tothe robust superhedging price as the risk aversion parameter increases. This section lays out the foundations for our framework to investigate the arbitrage properties under distributional uncer-tainty. Recall the approach taken here is to start from the classical no-arbitrage conditions and introduce a notion of distribu-tional uncertainty via the Wasserstein distance metric. As such, we include deﬁnitions for these terms as well as commentaryon some important results:(i) deﬁnitions for no-arbitrage and statistical arbitrage conditions;(ii) Lagrangian duality to formulate the dual problem for robust arbitrage in ﬁnancial markets;(iii) existence and structure of worst case distributions;(iv) computation of Wasserstein distance between distributions.

The set of admissible portfolio weights for the weak no-arbitrage conditions is Γ w ( S ) : = { w ∈ R n : w · S = w (cid:54) = } . (WW)The set of admissible portfolio weights for the strong no-arbitrage conditions is Γ s ( S ) : = { w ∈ R n : w · S < } . (SW)The no-arbitrage condition to be evaluated under probability measure Q in both cases is Pr ( w · S ≥ ) = E Q [ { w · S ≥ } ] < w satisfy the positive homogeneity property (of degree zero) since Pr ( w · S ≥ ) = Pr ( ˜ w · S ≥ ) for ˜ w = cw and c >

0. It is the proportions of the holdings in the assets that distinguish w vectors, not their absolute sizes.Weak arbitrage requires two conditions to hold: Pr ( w · S ≥ ) = and Pr ( w · S > ) >

0. The second condition is noteasily incorporated into the duality framework of this paper and hence it is omitted. Consequently the critical radius δ ∗ w that isdeveloped in Section 2 may not be tight. Strong arbitrage requires just one condition hence the bound δ ∗ s will be tight.For a given measure Q , no weak arbitrage means that sup w ∈ Γ w E Q [ { w · S ≥ } ] <

1. Similarly, for a given measure Q , nostrong arbitrage means that sup w ∈ Γ s E Q [ { w · S ≥ } ] <

1. The empirical measure, Q N , is deﬁned as Q N ( dz ) = N ∑ Ni = s ( , i ) ( dz ) .To simplify the notation, the leading subscript on s ( , i ) is suppressed and going forward we refer to the realization of time 1asset price vector s ( , i ) as just s i . In the context of this work, the uncertainty set for probability measures is U δ ( Q N ) = { Q : D c ( Q , Q N ) ≤ δ } where D c is the optimal transport cost or Wasserstein discrepancy for cost function c (Blanchet et al., 2018).The deﬁnition for D c is D c ( Q , Q (cid:48) ) = inf { E π [ c ( X , Y )] : π ∈ P ( R n × R n ) , π X = Q , π Y = Q (cid:48) } where P denotes the space of Borel probability measures and π X and π Y denote the distributions of X and Y . Here X denotesasset prices S X ∈ R n and Y denotes asset prices S Y ∈ R n respectively. This work uses the cost function c where c ( u , v ) = (cid:107) u − v (cid:107) . For clarity we cite the following result from the literature (Delbaen and Schachermayer, 2006). Let S = ( S t ) Tt = be a discreteprice process (with unit increments and T ∈ N ) on a ﬁnite probability space ( Ω , F , P ) . Then the following are equivalent:(i) S satisﬁes the no-arbitrage property;(ii) For each 0 ≤ t < T , we have that the one-period market ( S t , S t + ) with respect to the ﬁltration ( F t , F t + ) satisﬁes theno-arbitrage property.Further detail on the equivalence of single and multi period no-arbitrage can be found in e.g. LeRoy and Werner (2014).As our focus in this paper is on the discrete single period setting, the above relationship sufﬁces. One direction for furtherresearch would be to consider the robust no-arbitrage properties in a multi period continuous time setting for a suitable class ofadmissible trading strategies. A more general version of the fundamental theorem of asset pricing applies there. See Delbaenand Schachermayer (2006) for additional detail on this topic. .4.3 Weak and Strong Statistical Arbitrage (SA) Conditions To characterize the situation where a proﬁtable trading opportunity is highly likely yet not necessarily certain, we introducea notion of statistical arbitrage. Recall that no universal deﬁnition of statistical arbitrage currently exists (Lazzarino et al.,2018). Towards that end, we propose using a relaxation of the classical arbitrage conditions to deﬁne a notion of statisticalarbitrage. In particular, let us write the best case (bc) statistical arbitrage (of level α bc ∈ ( , ) ) condition under probabilitymeasure Q as Pr ( w · S ≥ ) = E Q [ { w · S ≥ } ] ≤ α bc . The set of admissible portfolio weights for the weak (strong) conditionis w ∈ Γ w ( s ) as before (see Section 1.4.1). Intuitively, the best case statistical arbitrage condition says that it should not bepossible to construct a zero (or negative) cost portfolio that returns either a proﬁt or no chance of losses with probability α bc close to 1. In the limit α bc → α wc ∈ ( , ) ) is Pr ( w · S ≥ ) = E Q [ { w · S ≥ } ] ≥ α wc . Probability α wc close to 0 describes a no-win situation. In Section 2 we formulate the primal stochastic optimization problem for distributionally robust arbitrage-free markets. Asin our earlier work (Singh and Zhang, 2019) a key step in the approach is to use recent Lagrangian duality results to formulatethe equivalent dual problem. The dual problem is much more tractable than the primal problem since it only involves thereference probability measure as opposed to a Wasserstein ball of probability measures (of some ﬁnite radius). This allowsus to solve a maximin optimization problem under the original empirical measure deﬁned by the selected data set. A briefrestatement of this duality result follows next.For real valued upper semicontinuous objective function f ∈ L and non-negative lower semicontinuous cost function c such that { ( u , v ) : c ( u , v ) < ∞ } is Borel measurable and non-empty, it holds that (Blanchet et al., 2016)sup Q ∈ U δ ( Q N ) E Q [ f ( X )] = inf λ ≥ [ λ δ + N n ∑ i = Ψ λ ( x i )] where Ψ λ ( x i ) : = sup u ∈ dom ( f ) [ f ( u ) − λ c ( u , x i )] . The primal problem (LHS above) is concerned with the worst case expected loss for some objective function f with respect to aWasserstein ball of probability measures of ﬁnite radius δ . The Wasserstein ball is used to reﬂect some (real world) uncertaintyabout the true underlying distribution for random variable (or vector) X . Note that the primal problem is an inﬁnite dimensionalstochastic optimization problem and thus difﬁcult to solve directly. The simplicity and tractability of the dual problem (RHSabove) make it quite attractive as an analytical and/or computational tool in our toolkit.Further details, including proofs and concrete examples, can be found in the papers by Blanchet and Murthy (2019), Gaoand Kleywegt (2016), and Esfahani and Kuhn (2018). These authors independently derived these results around the sametime although Blanchet and Murthy (2019) did so in a more general setting. The duality result has been applied by the aboveauthors and others in several papers on topics in data driven distributionally robust stochastic optimization such as robustmachine learning, portfolio selection, and risk management. For these types of robust optimization problems, the incorporationof distributional uncertainty can be viewed as adding a penalty term (similar to penalized regression) to the optimal solution(Blanchet et al., 2018). This gives us a nice intuitive way to think about the cost of robustness. Simply put, the set of worst case (wc) distributions (when non-empty) can be deﬁned as WC ( f , δ ) : = { Q ∗ : E Q ∗ [ f ( X )] = sup Q ∈ U δ ( Q N ) E Q [ f ( X )] } . Another recent set of results from the literature describes the existence and structure of the worstcase distribution(s) when they exist (Blanchet and Murthy, 2019), (Gao and Kleywegt, 2016), (Esfahani and Kuhn, 2018). Theboundedness conditions for existence are tied to the growth rate κ : = lim sup d ( X , X ) → ∞ f ( X ) − f ( X ) d ( X , X ) for ﬁxed X and the value of the dualminimizer λ ∗ . For empirical reference distributions, supported on N points, such that WC ( f , δ ) is non-empty, there exists a worst case distribution that is another empirical distribution supported on at most N + N points, they can be identiﬁed as solving x ∗ i ∈ arg min ˜ x ∈ dom ( f ) [ λ ∗ c ( ˜ x , x i ) − f ( ˜ x )] . At most one point has its probability mass split into two pieces (according to budget constraint δ ) that solve x ∗ i , x ∗∗ i ∈ arg min ˜ x ∈ dom ( f ) [ λ ∗ c ( ˜ x , x i ) − f ( ˜ x )] . Details can be found in Gao and Kleywegt (2016). For our problem setting, the growthrate conditions are satisﬁed and hence we proceed to formulate and then apply a greedy algorithm (see Section 2.2) to computethe worst case distribution for a concrete example in Section 5. A similar example from the literature, which uses a greedyalgorithm to compute the minimal (worst case) membership to a given set C , is covered in (Gao and Kleywegt, 2016). Note thatother worst case distributions can be constructed with different support sets and/or probability mass functions (PMFs). It can e insightful to examine how the reference distribution can be perturbed for a given objective f as δ varies. See Section 2.2 forspeciﬁc commentary on the structure and construction of the worst case distribution(s) for the robust NA problem. This section introduces some standard and recent results on computing Wasserstein distance between distributions. Therecent results are focused on discrete distributions since our problems of interest are data driven. The standard results (below) aretaken from the online document by Wasserman (2017). Wasserstein distance has simple expressions for univariate distributions.The Wasserstein distance of order p is deﬁned over the set of joint distributions P with marginals Q and Q (cid:48) as W p ( Q , Q (cid:48) ) = (cid:18) inf π ∈ P ( X , Y ) (cid:90) (cid:107) x − y (cid:107) p d π ( x , y ) (cid:19) / p . Note that in this work we consider Wasserstein distance of order p =

1. When d = W p ( Q , Q (cid:48) ) = (cid:18) (cid:90) | F − ( z ) − G − ( z ) | p dz (cid:19) / p . For empirical distributions with N points, there is the formula using order statistics on ( X , Y ) W p ( Q , Q (cid:48) ) = (cid:32) N ∑ i = (cid:107) X ( i ) − Y ( i ) (cid:107) p (cid:33) / p . Additional closed forms are known for: (i) normal distributions, (ii) mappings that relate Wasserstein distance to multiresolution L distance. See Wasserman (2017) for details. This concludes the brief survey of standard (closed form) results.For discrete distributions, at least a couple of methods have been recently developed to compute approximate and/or (inthe limit) exact Wasserstein distance. The commentary on these methods is taken from Xie et al. (2018). For distributions withﬁnite support, and cost matrix C , one can compute W ( Q , Q (cid:48) ) : = min π (cid:104) C , π (cid:105) with probability simplex constraints using linearprogramming (LP) methods of O ( N ) complexity. An entropy regularized version of this, using regularizer h ( π ) : = ∑ π i , j log π i , j gives rise to the Sinkhorn distance W ε ( Q , Q (cid:48) ) : = min π (cid:104) C , π (cid:105) + ε h ( π ) which can be solved using iterative Bregman projections via the Sinkhorn algorithm. However, the authors comment thatcertain problems (such as generative model learning and barycenter computation) experience performance degradation for amoderately sized ε but opting for a small size can be computationally expensive. To address these shortcomings, they developtheir own approach called inexact proximal point method for optimal transport (IPOT). The proximal point iteration takes theform π ( t + ) = arg min π (cid:104) C , π (cid:105) + β ( t ) D h ( π , π ( t ) ) where β denotes a parameter of the method and D h denotes the Bregman divergence based on the entropy function. Substitutionfor Bregman divergence gives the form π ( t + ) = arg min π (cid:104) C − β ( t ) log π ( t ) , π (cid:105) + β ( t ) h ( π ) . It turns out that this iteration can also be solved via the Sinkhorn algorithm. However the authors propose an inexact methodthat improves efﬁciency while maintaining convergence. See Xie et al. (2018) for details.

This section develops the theory for robust arbitrage in ﬁnancial markets. In Section 2.1, the primal problem is formulatedusing classical notions of arbitrage as discussed in Section 1.4.1. The dual problem is formulated using the Lagrangian dualityresult from Section 1.4.4. Note that the dual problem is a maximin stochastic optimization problem. The inner optimizationproblem (evaluating Ψ λ , w ) can be solved analytically using the Projection Theorem (Calaﬁore and El Ghaoui, 2014). Themiddle optimization problem (evaluating the dual objective function over inf λ ≥ ) can be solved via execution of a simple linearsearch algorithm over a ﬁnite set of points. The outer optimization problem (evaluating over sup w ∈ Γ w ( s ) ) can be formulated asan NLP. Finally, the middle and outer problems can be solved jointly via a maximin NLP approach.Section 2.2 gives details on the worst case distributions and Sections 2.3 and 2.4 show how to incorporate portfolio restric-tions (such as short sales) in a straightforward manner. Section 2.5 introduces the complementary problem of how to ﬁnd thenearest arbitrage-free measure to the arbitrage admissible reference measure. This machinery gives us a practical approach toexplore applications of our framework for robust arbitrage. .1 Robust Weak and Strong No-Arbitrage (NA) Conditions The robust weak no-arbitrage conditions can be expressed assup w ∈ Γ w sup Q ∈ U δ ( Q N ) E Q [ { w · S ≥ } ] < Γ w is deﬁned in WW. Note the indicator function { w · S ≥ } on closed set { w · S ≥ } is upper semicontinuous hence wecan apply the duality theorem (see Section 1.4.4) to obtain the dual formulationsup w ∈ Γ w inf λ ≥ [ λ δ + N N ∑ i = Ψ λ , w ( s i ) ] < Ψ λ , w is deﬁned, in terms of cost function c , as Ψ λ , w = sup ˜ s ∈ R n [ { w · ˜ s ≥ } − λ c ( ˜ s , s i ) ] . Similarly, for the robust strongno-arbitrage conditions sup w ∈ Γ s sup Q ∈ U δ ( Q N ) E Q [ { w · S ≥ } ] < Γ s is deﬁned in SW, the dual formulation issup w ∈ Γ s inf λ ≥ [ λ δ + N N ∑ i = Ψ λ , w ( s i ) ] < . (SD) The objective here is to evaluate Ψ λ , w in closed form. There are two cases to consider. Case 1. { w · s i ≥ } = = ⇒ Ψ λ , w ( s i ) = − λ · = which is optimal. Case 2. { w · s i ≥ } = = ⇒ Ψ λ , w ( s i ) = [ − λ c ( s ∗ i , s i )] + where s ∗ i = arg min (cid:107) ˜ s − s i (cid:107) is optimal. By the Projection Theorem (Calaﬁore and El Ghaoui, 2014), (cid:107) s ∗ i − s i (cid:107) = | w (cid:62) s i |(cid:107) w (cid:107) = ⇒ Ψ λ , w ( s i ) = [ − λ c i ] + for c i = | w (cid:62) s i |(cid:107) w (cid:107) ∈ R n + . Proposition 1. N N ∑ i = Ψ λ , w ( s i ) = K ( w ) + K ( λ , w ) where K ( w ) = N ∑ Ni = { w · s i ≥ } and K ( λ , w ) = N ∑ Ni = { w · s i < } [ − λ c i ] + for c i = | w (cid:62) s i |(cid:107) w (cid:107) ∈ R n + .Proof. This follows by a straightforward application of the two cases above.

Remark 1.

In this subsubsection, the dependency of λ ∗ on ( w , δ ) is suppressed to ease the notation. Now the objective is to evaluate inf λ ≥ H ( λ ) : = [ λ δ + K ( w ) + K ( λ , w ) ] . Since H ( λ ) is a convex function of λ , the ﬁrstorder optimality condition sufﬁces to determine λ ∗ = arg min λ ≥ H ( λ ) . Note that H ( λ ) may have kinks so we look for λ ∗ suchthat 0 ∈ ∂ H ( λ ∗ ) . Following the approach in our earlier work (Singh and Zhang, 2019), we arrive at the following result. Proposition 2.

Let λ ∗ = sup λ ≥ { λ : δ − N [ ∑ i ∈ J + ( λ ) { w · s i < } c i ] ≤ } = inf λ ≥ { λ : δ − N [ ∑ i ∈ J ( λ ) { w · s i < } c i ] ≥ } , whereJ + ( λ ) = { i ∈ { , . . . , N } : 1 − c i λ > } , J ( λ ) = { i ∈ { , . . . , N } : 1 − c i λ ≥ } . In the degenerate case, where sup λ ≥ is takenover an empty set, select λ ∗ = = ⇒ H ( λ ∗ ) = .Proof sketch. This result follows from writing down the ﬁrst order conditions for left and right derivatives for convex objectivefunction H ( λ ) . For each additional index i ∈ J + ( J ) such that at least one indicator function is true, we pick up an additional c i term in the left (right) derivative. Search on λ (from the left or the right) until we ﬁnd λ ∗ such that 0 ∈ ∂ H ( λ ∗ ) . roof. The ﬁrst order optimality condition says δ − N ∑ i ∈ J + ( λ ) { w · s i < } c i ≤ ≤ δ − N ∑ i ∈ J ( λ ) { w · s i < } c i . Note the LHS is an increasing function in λ . Hence one can write λ ∗ = sup λ ≥ { λ : δ − N ∑ i ∈ J + ( λ ) { w · s i < } c i ≤ } . Similarly the RHS is also an increasing function in λ . Equivalently, one can write λ ∗ = inf λ ≥ { λ : δ − N ∑ i ∈ J ( λ ) { w · s i < } c i ≥ } . Proposition 3.

Equivalently, λ ∗ can be computed via a linear seach over { c i } as in Algorithm 1 (listed below).Proof. The break points for J ( J + ) are { c i : i ∈ { , . . . , N }} . Observe that the only possible candidates for λ ∗ , as given inProposition 2.2, are { c i : i ∈ { , . . . , N }} or 0. One can sort and relabel the c i to be in increasing order. Note that ( − c j λ ) > = ⇒ ( − c i λ ) > ∀ c i ≤ c j . Thus m ∈ J ( J + ) = ⇒ { , . . . , m } ∈ J ( J + ) . Search backwards to ﬁnd the smallest index k ∗ ∈ { , . . . , N } such that ∑ k ∗ i = { w · s i < } c i ≥ N δ . If no such index k ∗ is found, return λ ∗ = λ ∗ = c k ∗ . Algorithm 1:

Linear Search over { c i } to compute λ ∗ Input: { c i } , w , { s i } , N , δ Output: λ ∗ = sup λ ≥ { λ : δ − N [ ∑ i ∈ J + ( λ ) { w · s i < } c i ] ≤ } = inf λ ≥ { λ : δ − N [ ∑ i ∈ J ( λ ) { w · s i < } c i ] ≥ } Set Q ∗ = Q N Sort { c i } Increasing Compute { V k } where V k : = ∑ ki = { w · s i < } c i k = N if V k < N δ then return λ ∗ : = else while k ≥ and V k ≥ N δ do k = k − k ∗ = k + return λ ∗ : = c k ∗ The weak no-arbitrage conditions can now be expressed as v w ( δ ) : = sup w ∈ Γ w { λ ∗ ( w , δ ) δ + K ( w ) + K ( λ ∗ ( w , δ ) , w ) } < . (WD2)Similarly, for the strong no-arbitrage conditions v s ( δ ) : = sup w ∈ Γ s { λ ∗ ( w , δ ) δ + K ( w ) + K ( λ ∗ ( w , δ ) , w ) } < . (SD2)The authors are not aware of any such pairing of mixed integer nonlinear program (MINLP) formulation and solver thatcan return the (global) optimal values v w ( s ) ( δ ) for arbitrary problem instances. Our attempts at such an MINLP formu-lation to be solved using Neos / Baron MINLP solvers (Tawarmalani and Sahinidis, 2005) and/or Neos / Knitro solvers(Byrd et al., 2006) were successful on small but not large problem instances. Difﬁculties were encountered in ﬁnding fea-sible solutions and/or returning optimal solutions. Given the ﬁndings above, our original solution strategy was revised tofocus on solving an equivalent NLP maximin problem formulation to local optimality using the Matlab fminimax solver andthe identity max x min k F k ( x ) = − min x max k ( − F k ( x )) . The equivalent formulation is constructed from the observation that λ ∗ ∈ { c k : k ∈ { , . . . , N }} ∪ { λ : = } . Developing a global solution strategy would be an interesting area for further research. heorem 1. v w ( δ ) is approximated by the (global) solution to nonlinear program (NLP) N WNA (listed below). The constraints on variables below, with index i , apply for i ∈ { , . . . , N } , although this is suppressed. Also recall that weightvectors w satisfy homogeneity, hence the use of “big M” to express w ∈ Γ w ( s ) is appropriate. > w ∈ R n maximize < b min λ k : k ∈{ , ,..., N } F k ( w ) : = λ k δ + N (cid:20) N ∑ i = { w · s i ≥ } + N ∑ i = z + i { w · s i < } (cid:21) v w ( δ ) = subject to c i = | w (cid:62) s i |(cid:107) w (cid:107) , λ k = c k ∀ k ∈ { , . . . , N } , λ = , | w i | ≤ M , w · S = , n ∑ j = | w j | ≥ ε , z i = [ − λ k c i ] (1) Proof.

The NLP formulation follows from equation WD2 and the fact that λ ∗ ∈ { c k : k ∈ { , . . . , N }} ∪ { λ : = } . Corollary 1. v s ( δ ) is approximated by the solution to NLP N SNA (described next). N SNA is very similar to N WNA. Onejust needs to omit the ∑ nj = | w j | ≥ ε constraint and replace the initial cost constraint w · S = with − M ≤ w · S ≤ − ε , orequivalently with w · S = κ < , ( κ arbitrary), using the homogeneity property of w.Proof. There is a slight variation on the constraints to express w ∈ Γ s . No other changes are needed. Theorem 2.

The critical radius δ ∗ w ( s ) can be expressed as inf { δ ≥ v w ( s ) ( δ ) = } . Furthermore, δ ∗ w ( s ) can be explicitlycomputed via binary search. Let δ w ( s ) < δ ∗ w ( s ) . For Q w ( s ) ∈ U δ w ( s ) ( Q N ) , it follows that Q w ( s ) is weak (strong) arbitrage-free. ForQ w ( s ) / ∈ U δ ∗ w ( s ) ( Q N ) , it follows that Q w ( s ) may admit weak (strong) arbitrage.Proof. This characterization of the critical radius δ ∗ w ( s ) follows from the condition WD2 (SD2) as well as the deﬁnition of weak(strong) no-arbitrage. The asymptotic properties of v w ( s ) are such that v w ( s ) ( ) ≤ δ → ∞ v w ( s ) ( δ ) =

1. Furthermore, since v w ( s ) ( δ ) is a non-decreasing function of δ , it follows that δ ∗ w ( s ) can be computed via binary search.One can view the critical radius δ ∗ w ( s ) as a relative measure of the degree of weak (strong) arbitrage in the reference measure Q N . Those Q N which are “close” to allowing arbitrage will have a relatively smaller value of δ ∗ w ( s ) . This subsection expands on the commentary in Section 1.4.5 and works through the details for how this notion appliesto the robust no-arbitrage problem. First recall from Section 1.4.5 the deﬁnition of the set of worst case distributions as WC ( f , δ ) : = { Q ∗ : E Q ∗ [ f ( X )] = sup Q ∈ U δ ( Q N ) E Q [ f ( X )] } and x ∗ i ∈ arg min ˜ x ∈ dom ( f ) [ λ ∗ c ( ˜ x , x i ) − f ( ˜ x )] . For the NA problem, c i represents c ( s ∗ i , s i ) and the objective function is f ( S ) : = { w · S ≥ } hence growth rate κ = = ⇒ WC non-empty (growth ratecondition satisﬁed). From an arbitrageur’s perspective, Q ∗ represents a best case distribution, hence let us relabel the set WC as BC . We use the notation BC ( w , δ ) to emphasize the parametrization on w . In Section 6 the greedy algorithm (to be describedbelow) is used to compute a best case distribution Q ∗ w ∈ BC ( w , δ ∗ ) . Please note that although this Q ∗ w satisﬁes E Q ∗ [ f ( S )] = necessarily allow arbitrage. Intuitively, an arbitrage distribution would use up budget δ ≥ δ ∗ to allow arbitragewhereas the greedy worst case distribution may not do so. An arbitrage distribution must satisfysup w ∈ Γ w ( s ) sup Q ∈ U δ ∗ ( Q N ) E Q [ { w · S ≥ } ] = . whereas a (greedy) worst case distribution with budget δ ≥ δ ∗ only needs to satisfy the condition that the inner sup evaluates to1. However, selecting portfolio weights w ∗ that satisfy the outer sup condition above, one can recover Q ∗ w ∗ that allows arbitrage. lgorithm 2: Greedy Algorithm to compute Q ∗ w ∈ BC ( w , δ ) for NA Input: f , w , { s i } , { c i } , N , δ Output: Q ∗ w : E Q ∗ w [ f ( X )] = sup Q ∈ U δ ( Q N ) E Q [ f ( X )] Deﬁne Q ∗ w : = { Q ∗ v , Q ∗ p } where Q ∗ v denotes the support and Q ∗ p denotes probabilities Set Q ∗ w = Q N so that those scenarios { i ∈ { , . . . , N } : { w · s i ≥ } } do not move Sort { c i } Increasing Set V : = { V k } where V k : = ∑ ki = { w · s i < } c i k = while k ≤ N and V k ≤ N δ do if { w · s k < } and ( − λ ∗ c k ) ≥ then Q ∗ v ( k ) = s k − sgn ( w · s k ) c k w (cid:107) w (cid:107) k = k + if k ≤ N and V k > N δ and { w · s k < } then p = ( N δ − V k − ) / V k Q ∗ p ( N + ) = p N Q ∗ v ( N + ) = s k − sgn ( w · s k ) c k w (cid:107) w (cid:107) Q ∗ p ( k ) = − p N This subsection discusses reﬁnements to the no-arbitrage conditions (see Section 2.1) to characterize portfolio restrictionssuch as short sales restrictions, min and max position constraints, and cardinality constraints (Cornuejols and T¨ut¨unc¨u, 2018).For efﬁciency of presentation, we refer the reader to the N WNA and N SNA NLP problems discussed in Section 2.1.3 anddo no restate those formulations here. An advantage of the computational machinery developed in this paper is that suchportfolio restrictions can be readily incorporated into the existing framework. Note that these additional constraints may causethe restricted NLP problem to violate the homogeneity property of w so one should exercise caution in formulating the newproblem correctly. For example, for restricted N SNA one should use the − M ≤ w · S ≤ − ε constraint instead of w · S = κ < κ arbitrary). Table 1 below describes the various portfolio restrictions (discussed here) and associated constraints. Others arepossible as well. Note that the index set is j ∈ { , . . . , n } which is suppressed for brevity. Table 1: Portfolio Restrictions

Restriction MINLP Constraint No Restriction

Short Sales w j ≥ ss j where ss j ∈ R − is short sales limit ss j = − M Min Positions | w j | ≥ w where w ∈ R + denotes min position w = | w j | ≤ w where w ∈ R + denotes max position w = M Cardinality ∑ nj = {| w j |≥ ε } ≤ m where m ∈ { , . . . , n } is cardinality constraint m = n Allocations | ∑ j ∈ A k w j S j | ≤ A k where A k ∈ R + is asset class k allocation constraint A k = Mn This subsection gives a brief summary (using the author’s notation) of the work by Oleaga (2012) to formulate equivalent(weak) no-arbitrage conditions, in terms of existence of risk neutral probability measures, under no short sales. A similarexercise could be conducted for strong no-arbitrage conditions although the author focuses on the weak conditions. Fromthe previous subsection, no short sales conditions can be directly imposed by setting ss j = j ∈ J for some index set ⊆ { , . . . , n } . Oleaga begins his paper with a remark that the Fundamental Theorem of Finance establishes the equivalencebetween the no-arbitrage conditions and the existence of a risk neutral probability measure (see Section 1.1 of this paper fordetails) under the assumption that short selling of risky securities is allowed. He remarks that when short sales are not allowed,the academic literature is scarce regarding equivalent conditions on probability measures. As motivation for his main result(which implies that existence of a risk neutral measure is not guaranteed under no short sales) the author develops two examples:one using a simple one-period binomial model with one risky asset, and another involving wagers in a stylized market where theassets are Arrow-Debreu securities. Using standard techniques in linear algebra, convex analysis, and the separating hyperplanetheorem the author proves his main result which is stated below for convenience. Theorem. (Arbitrage Theorem for No Short Sales). The market model M with m scenarios for n assets X j : j ∈ { , . . . , n } hasno-arbitrage opportunities iff there exists a probability measure π such that the initial prices x j are greater than or equal to thediscounted value of the expected future prices under π . Written in symbols we have:x j ≥ + r m ∑ i = π i X i j where j ∈ { , . . . , n } . Moreover, for those assets X j : j ∈ { , . . . , n } where short selling is allowed, equality is achieved in the above relation. Inparticular, the bank account or cash bond (used to execute the borrowing to purchase the portfolio at time 0) is treated as aspecial asset X excluded from the above relation. It would hold with equality if included. In an independent work, LeRoy and Werner (2014) develop essentially the same results for both weak and strong no-arbitrageconditions. They show that for the weak conditions, the probability measure π is such that π > π ≥ Recall that the motivating question here is how to ﬁnd the nearest arbitrage-free measure to the arbitrage admissible refer-ence measure.

This subsection looks at the problem of computing the minimal distance δ ∗ g to an arbitrage-free measure for a referencemeasure Q N that admits arbitrage. In a discrete setting, the nearest (strong) no-arbitrage problem can be formulated as δ ∗ ns = min ˜ X (cid:107) X − ˜ X (cid:107) F such that ∃ q ≥ p = ˜ Xq (NSP)where (cid:107) X (cid:107) F denotes the Frobenius norm of matrix X . A penalty relaxation can be formulated as δ ∗ nsr ( β ) = min ˜ X , q ≥ (cid:107) X − ˜ X (cid:107) F + β (cid:107) p − ˜ Xq (cid:107) F (NSPR)A tight lower bound δ ∗ nst ≤ δ ∗ ns to the relaxation problem NSPR is given by δ ∗ nst = sup β ≥ δ ∗ nsr ( β ) (NSPRT)For a complete market with non-redundant securities, note that X (and hence ˜ X ) is a full rank, invertible square n × n matrix. This subsection mimics the approach of the previous subsection, however we make use of the equivalent probability mea-sure condition discussed in Section 2.4 (Oleaga, 2012), (LeRoy and Werner, 2014). In a discrete setting, the nearest (weak)no-arbitrage problem, under no short sales, can be formulated as δ ∗ nns = min ˜ X (cid:107) X − ˜ X (cid:107) F such that ∃ probability measure q > p ≥ ˜ X + r q . (NNWP)A penalty relaxation can be speciﬁed as δ ∗ nnsr ( β ) = min ˜ X , q > (cid:107) X − ˜ X (cid:107) F + β (cid:107) ( ˜ Xq − ( + r ) p ) + (cid:107) F (NNWPR) tight lower bound δ ∗ nnst ≤ δ ∗ nns to the relaxation problem NNWPR is given by δ ∗ nnst = sup β ≥ δ ∗ nnsr ( β ) (NNWPRT)Recall the bank account or cash bond (used to borrow) is excluded from the above relation. For a complete market withnon-redundant securities, note that X (and hence ˜ X ) is a full rank, invertible square n × n matrix. For completeness, we comment on an alternate formulation of the robust NA conditions (from Section 2.1) that exchanges the order of sup operators. Such conditions can be expressed assup Q ∈ U δ ( Q N ) sup w ∈ Γ s E Q [ { w · S ≥ } ] < Γ s is deﬁned in SW. The intuitive meaning of this formulation is that the market player ﬁrst chooses a favorable distri-bution Q ∈ U δ ( Q N ) and then the portfolio manager chooses an optimal w ∈ Γ s . It is clear thatsup Q ∈ U δ ( Q N ) sup w ∈ Γ s E Q [ { w · S ≥ } ] = sup w ∈ Γ s sup Q ∈ U δ ( Q N ) E Q [ { w · S ≥ } ] . This section develops the theory for robust statistical arbitrage in ﬁnancial markets. We follow the same approach as inSection 2 for robust arbitrage. For simplicity, and to ease the notation, let us focus on the strong conditions. The weak conditionscan be handled similarly, replacing w ∈ Γ s with w ∈ Γ w , as in Section 2. In Section 3.1, the primal problem for the SA best caseconditions is formulated using notions of statistical arbitrage as discussed in Section 1.4.3. The dual problem is formulatedusing the Lagrangian duality result from Section 1.4.4. The dual problem is a maximin stochastic optimization problem. Section3.2 touches on the best case SA distribution. In Section 3.3, the primal problem for the SA worst case conditions is formulated.The dual problem for this is maximax . Both dual problems can be solved as in Section 2. Section 3.4 touches on the worstcase SA distribution. Section 3.5 addresses portfolio restrictions. Section 3.6 covers the nearest SA problem. Section 3.7discusses alternate robust SA conditions. Altogether, this machinery gives us a practical approach to explore applications ofour framework in Sections 4 and 6. The robust (strong) statistical arbitrage best case conditions (of level α bc ∈ ( , ) ) can be expressed assup w ∈ Γ s sup Q ∈ U δ ( Q N ) E Q [ { w · S ≥ } ] ≤ α bc , (SSAP)where Γ s is deﬁned in SW. As before, the indicator function { w · S ≥ } on closed set { w · S ≥ } is upper semicontinuous hencewe can apply the duality theorem (see Section 1.4.4) to obtain the dual formulationsup w ∈ Γ s inf λ ≥ [ λ δ + N N ∑ i = Ψ λ , w ( s i ) ] ≤ α bc (SSAD)where Ψ λ , w is deﬁned, in terms of cost function c , as Ψ λ , w = sup ˜ s ∈ R n [ { w · ˜ s ≥ } − λ c ( ˜ s , s i ) ] . The goal here is the same as for the robust no-arbitrage conditions in Section 2.1.1, namely to evaluate Ψ λ , w in closed form.As such the solution is also the same, therefore one can invoke Proposition 2.1 to compute N ∑ Ni = Ψ λ , w ( s i ) . As before, in Section 2.1.2, the objective is to evaluate inf λ ≥ H ( λ ) : = [ λ δ + K ( w ) + K ( λ , w ) ] . As such the solution isalso the same, therefore one can invoke Propositions 2.2, 2.3 and Algorithm 1 to compute λ ∗ and H ( λ ∗ ) . .1.3 Outer Optimization Problem As before, in Section 2.1.3, the objective is to evaluate v s ( δ ) : = sup w ∈ Γ s { λ ∗ ( w , δ ) δ + K ( w ) + K ( λ ∗ ( w , δ ) , w ) } . As suchthe solution is also the same, therefore one can invoke Theorem 2.1 and Corollary 2.1.1 to evaluate the above expression(s).The analog to Theorem 2.2 is given below. Theorem 3.

The critical radius δ bc α can be expressed as inf { δ ≥ v s ( δ ) ≥ α bc } . Furthermore, δ bc α can be explicitly computedvia binary search. Let δ < δ bc α . For Q ∈ U δ ( Q N ) , it follows that Q is (strong) statistical arbitrage free, for level α > v s ( δ bc α ) .For Q / ∈ U δ bc α ( Q N ) , it follows that Q may admit (strong) statistical arbitrage for level α > v s ( δ bc α ) .Proof. This characterization of the critical radius δ bc α follows from the condition SSAD as well as the deﬁnition of (strong)statistical arbitrage. The asymptotic properties of v s are such that v s ( ) ≤ δ → ∞ v s ( δ ) =

1. Furthermore, since v s ( δ ) isa non-decreasing function of δ , it follows that δ bc α can be computed via binary search.One can view critical radius δ bc α as a relative measure of the degree of (strong) statistical arbitrage in reference measure Q N .Those Q N which are “close” to admitting statistical arbitrage of level α bc will have a relatively smaller value of δ bc α . The characterization of best case distributions for NA problems carries over into the SA context. In particular, one isinterested in best case distributions Q α w ∈ BC ( w , δ α ) such that E Q α w [ { w · S ≥ } ] = sup Q ∈ U δα ( Q N ) E Q [ { w · S ≥ } ] . As before, byselecting portfolio weights w α that satisfy the outer sup conditionsup w ∈ Γ s sup Q ∈ U δα ( Q N ) E Q [ { w · S ≥ } ] ≥ α bc , one can recover Q α w α that admits statistical arbitrage of level α bc . See Section 6.2 for a concrete example. The robust (strong) statistical arbitrage worst case conditions (of level α wc ∈ ( , ) ) can be expressed assup w ∈ Γ s inf Q ∈ U δ ( Q N ) E Q [ { w · S ≥ } ] ≥ α wc , (SSAP wc )where Γ s is deﬁned in SW. Relaxing the objective function from { w · S ≥ } to { w · S > } and using the relations { w · S > } = − { w · S ≤ } and inf ( S ) = − sup ( − S ) for bounded set S , we have the equivalent condition:sup w ∈ Γ s − (cid:40) sup Q ∈ U δ ( Q N ) E Q [ { w · S ≤ } − ] (cid:41) ≥ α wc . (SSAP2 wc )As before, the indicator function { w · S ≤ } on closed set { w · S ≤ } is upper semicontinuous hence we can apply the dualitytheorem (see Section 1.4.4) to obtain the dual formulationsup w ∈ Γ s − (cid:40) inf λ ≥ [ λ δ + N N ∑ i = Ψ wc λ , w ( s i ) ] (cid:41) ≥ α wc (SSAD wc )where Ψ wc λ , w is deﬁned, in terms of cost function c , as Ψ wc λ , w = sup ˜ s ∈ R n [ { w · ˜ s ≤ } − λ c ( ˜ s , s i ) − ] . The goal here is the same as for the robust no-arbitrage conditions in Section 2.1.1, namely to evaluate Ψ wc λ , w in closed form.There are two cases to consider. Case 1. { w · s i ≤ } = = ⇒ Ψ wc λ , w ( s i ) = − λ · − = which is optimal. Case 2. { w · s i ≤ } = = ⇒ Ψ wc λ , w ( s i ) = [ − λ c ( s ∗ i , s i )] + − where s ∗ i = arg min (cid:107) ˜ s − s i (cid:107) is optimal. y the Projection Theorem (Calaﬁore and El Ghaoui, 2014), (cid:107) s ∗ i − s i (cid:107) = | w (cid:62) s i |(cid:107) w (cid:107) = ⇒ Ψ wc λ , w ( s i ) = [ − λ c i ] + − c i = | w (cid:62) s i |(cid:107) w (cid:107) ∈ R n + . Proposition 4. N N ∑ i = Ψ wc λ , w ( s i ) = K wc ( w ) + K wc ( λ , w ) = K wc ( λ , w ) where K wc ( w ) = N ∑ Ni = { w · s i ≤ } · and K wc ( λ , w ) = N ∑ Ni = { w · s i > } ([ − λ c i ] + − ) for c i = | w (cid:62) s i |(cid:107) w (cid:107) ∈ R n + .Proof. This follows by a straightforward application of the two cases above.

As before, in Section 2.1.2, the objective is to evaluate inf λ ≥ H wc ( λ ) : = [ λ δ + K wc ( λ , w ) ] . As such the solution is alsothe same, with one exception : replace { w · s i < } with { w · s i > } in those results. Therefore one can apply Propositions 2.2, 2.3and Algorithm 1 (with the above replacement of indicator functions) to compute λ ∗ and H wc ( λ ∗ ) . As before, in Section 2.1.3, the objective is to evaluate v wcs ( δ ) : = sup w ∈ Γ s −{ λ ∗ ( w , δ ) δ + K wc ( λ ∗ ( w , δ ) , w ) } . As such thesolution is similar, with the following adjustments: replace F k ( w ) with − F wck ( w ) where − F wck ( w ) : = λ k δ + N (cid:20) N ∑ i = ( z + i − ) { w · s i > } (cid:21) and place a minus sign in front of the min term in the maximin expression for v w ( s ) ( δ ) . Therefore one can apply Theorem 2.1and Corollary 2.1.1 (with the above adjustments) to evaluate v wcs ( δ ) . The revised formulation is shown below. Theorem 4. v wcs ( δ ) is approximated by the (global) solution to nonlinear program (NLP) N SSA (listed below). The constraints on variables below, with index i , apply for i ∈ { , . . . , N } , although this is suppressed. > w ∈ R n maximize < b max λ k : k ∈{ , ,..., N } F wck ( w ) = − λ k δ + N (cid:20) N ∑ i = ( − z + i ) { w · s i > } (cid:21) v wcs ( δ ) = subject to c i = | w (cid:62) s i |(cid:107) w (cid:107) , λ k = c k ∀ k ∈ { , . . . , N } , λ = , | w i | ≤ M , w · S ≤ − ε , z i = [ − λ k c i ] (2) Proof.

The NLP formulation follows from the deﬁnition of v wcs and the fact that λ ∗ ∈ { c k : k ∈ { , . . . , N }} ∪ { λ : = } .The analog to Theorem 2.2 is given below. Theorem 5.

The critical radius δ wc α can be expressed as inf { δ ≥ v wcs ( δ ) ≤ α wc } . Furthermore, δ wc α can be explicitlycomputed via binary search. Let δ < δ wc α . For Q ∈ U δ wc α ( Q N ) , it follows that Q admits (strong) statistical arbitrage, for level α ≥ v wcs ( δ wc α ) . For Q / ∈ U δ wc α ( Q N ) , it follows that Q may not admit (strong) statistical arbitrage for level α < v wcs ( δ wc α ) .Proof. This characterization of the critical radius δ wc α follows from the condition (SSAD wc ) as well as the deﬁnition of (strong)statistical arbitrage. The asymptotic properties of v wcs are such that v wcs ( ) > δ → ∞ v wcs ( δ ) =

0. Furthermore, since v wcs ( δ ) is a non-increasing function of δ , it follows that δ wc α can be computed via binary search.One can view critical radius δ wc α as a relative measure of the degree of (strong) statistical arbitrage in reference measure Q N .Those Q N which are “close” to not admitting statistical arbitrage of level α wc will have a relatively smaller value of δ wc α . .4 Worst Case Distribution for SA Problem The characterization of worst case distributions for NA problems carries over into the SA context. In particular, one isinterested in worst case distributions Q α w ∈ WC ( w , δ α ) such that E Q α [ { w · S ≥ } ] = inf Q ∈ U δα ( Q N ) E Q [ { w · S ≥ } ] . By selectingportfolio weights w with their associated worst case distributions, it follows thatsup w ∈ Γ s E Q α w [ { w · S ≥ } ] ≤ α wc . Applying the greedy algorithm to { w · S < } = − { w · S ≥ } , one can recover Q α w that is the most punitive for w and admitsstatistical arbitrage of level at most α wc for a given w ∈ Γ s . See Section 6.2 for a concrete example. The portfolio restrictions for NA problems apply in the SA context as well. We refer the reader to Section 2.3 and do notduplicate the material here. The Farkas Lemma characterization of classical weak (strong) no arbitrage via the existence (anduniqueness for complete markets) of risk neutral measures does not yield any new relationships in the context of statisticalarbitrage under no short sales. As such, we do not establish any new results in this subsection. Note that the theorem givenin Section 2.4 still holds for probability measures Q α for α ∈ ( , ) ; in words, it holds for market models that admit statisticalarbitrage but not classical arbitrage. As above, the Farkas Lemma characterization does not yield any new relationships for the nearest no-arbitrage problem inthe context of statistical arbitrage. However, the nuances of how one uses the existing results in Section 2.5 (vs. Section 2.4)are different. In particular, one can apply those results for probability measures Q α for α =

1; in words, it holds for marketmodels that admit classical arbitrage.

The concept of exchanging the order of the sup and inf operators for the robust NA conditions (see Section 2.6) can beextended to cover SA. As before, exchanging the order of the operators gives the robust SA best case conditionssup Q ∈ U δ ( Q N ) sup w ∈ Γ s E Q [ { w · S ≥ } ] ≤ α bc . (RSSAP bc )Similarly, an alternate formulation of the robust SA worst case conditions isinf Q ∈ U δ ( Q N ) sup w ∈ Γ s E Q [ { w · S ≥ } ] ≥ α wc . (RSSAP wc )The intuitive meaning of these formulations is that the market adversary ﬁrst chooses a punitive distribution Q ∈ U δ ( Q N ) and then the portfolio manager chooses an optimal w ∈ Γ s . Although one can invoke the min-max inequality to establish the relationinf Q ∈ U δ ( Q N ) sup w ∈ Γ s E Q [ { w · S ≥ } ] ≥ sup w ∈ Γ s inf Q ∈ U δ ( Q N ) E Q [ { w · S ≥ } ] , ﬁnding a method to compute the LHS of RSSAP bc or RSSAP wc is not really achievable (to our knowledge) since the innerproblem is NP Hard (see Section 5 for a proof) and the outer problem is inﬁnite dimensional.

Section 4 presents applications of the theory developed in Sections 2 and 3 to robust option pricing and robust portfolioselection. In the latter we consider two examples: the classical Markowitz problem and a more modern view of risk using CVaR(as opposed to variance) as the measure of risk. .1 Robust Option Pricing This subsection is a reﬁnement (simpliﬁcation) of the result for robust pricing of European options given in Bartl et al.(2017). For clarity, we adopt the notation and problem setup of Example 2.14 (Robust Call) (Bartl et al., 2017). The approachtaken there is to add an additional constraint on the probability measure µ to reside within Wasserstein radius δ of the ref-erence (arbitrage-free) measure µ . For this example, let us assume µ is arbitrage-free, distance function d c is the secondorder Wasserstein distance with associated quadratic cost function c ( x , y ) = ( x − y ) / M ( R ) denotes the set of probabilitymeasures on R , and the penalty function is φ : = ∞ ( δ , ∞ ] with associated convex conjugate φ ∗ ( λ ) = λ δ . The authors show thatthe robust call option with maturity T , strike k , on a single asset, satisﬁes the relation:CALL robust ( k ) = sup { µ ∈ M ( R ) : (cid:82) R Sd µ = s } CALL ( k ) − φ ( d c ( µ , µ )) = inf β ∈ R inf λ > (cid:8) λ δ + CALL ( k − ( β + ) / ( λ )) + β / ( λ ) (cid:9) (3)where β denotes the Lagrange multiplier for the arbitrage-free probability measure constraint { µ ∈ M ( R ) : (cid:82) R Sd µ = s } , and λ denotes the Lagrange multiplier for the Wasserstein distance constraint d c ( µ , µ ) ≤ δ . Here CALL ( ˜ k ) denotes the non-robustcall option price for strike ˜ k . Now let us assume that we have calculated the critical radius δ ∗ w ( s ) for this problem (assume thereference measure µ is empirical) and we have chosen δ α < min ( δ ∗ w , δ ∗ s ) . Here δ α denotes the radius of a Wasserstein ballof probability measures that allow statistical arbitrage (up to some level α <

1) but not classical arbitrage. It follows fromTheorem 2.2 that the arbitrage-free probability measure constraint is not needed, hence one can simply set β : = robust ( k ) = inf λ > G ( λ ) : = (cid:8) λ δ α + CALL ( k − / ( λ )) (cid:9) . (4)Note that in formula 4 above, G ( λ ) is convex in λ . Once again, following the approach in our earlier work (Singh and Zhang,2019), we can simplify further to arrive at the following result. Proposition 5.

Let λ ∗ = sup λ ≥ { λ : δ α − N [ ∑ i ∈ J + ( λ ) / ( λ )] ≤ } = inf λ ≥ { λ : δ α − N [ ∑ i ∈ J ( λ ) / ( λ )] ≥ } , whereJ + ( λ ) = { i ∈ { , . . . , N } : [ / ( λ ) + s i − k ] > } , J ( λ ) = { i ∈ { , . . . , N } : [ / ( λ ) + s i − k ] ≥ } .Proof sketch. This result follows from writing down the ﬁrst order conditions for left and right derivatives for convex objectivefunction G ( λ ) . Inspection of the left and right derivatives for G ( λ ) reveals that they will cross zero (as λ sweeps from 0 to ∞ ) and hence the sup and inf operators will apply over non-empty sets. For each index i ∈ J + ( J ) we pick up another 1 / ( λ ) term in the left (right) derivative. Search on λ (from the left or the right) until we ﬁnd λ ∗ such that 0 ∈ ∂ G ( λ ∗ ) . Proof.

The ﬁrst order optimality condition says δ α − N ∑ i ∈ J + ( λ ) / ( λ ) ≤ ≤ δ α − N ∑ i ∈ J ( λ ) / ( λ ) . Note the LHS is an increasing function in λ . Hence one can write λ ∗ = sup λ ≥ { λ : δ α − N ∑ i ∈ J + ( λ ) / ( λ ) ≤ } . Similarly the RHS is also an increasing function in λ . Equivalently, one can write λ ∗ = inf λ ≥ { λ : δ α − N ∑ i ∈ J ( λ ) / ( λ ) ≥ } . Corollary 5.1.

CALL robust ( k ) = G ( λ ∗ ) : = (cid:2) λ ∗ δ α + CALL ( k − / ( λ ∗ )) (cid:3) where λ ∗ is given by Proposition 4.1 above.Proof. This follows by direct substitution of λ ∗ from Proposition 4.1 into formula 4 above. .2 Robust Portfolio Selection This subsection is a reﬁnement of the result(s) for robust Markowitz (mean variance) portfolio selection given in Blanchetet al. (2018). For clarity, we adopt the notation and problem setup of that paper. The convex primal problem is a distributionallyrobust Markowitz problem given by min φ ∈ F δ , ¯ r ( N ) max P ∈ U δ ( P N ) { φ (cid:62) Var P ( R ) φ } (5)where φ ∈ R d denotes the portfolio weight vector, R ∈ R d denotes the random (gross) asset returns, P N denotes the empiricalmeasure, U δ ( P N ) denotes the uncertainty set for probability measures, with associated cost function c ( u , v ) = (cid:107) v − u (cid:107) q for q ≥ Var P ( R ) denotes the covariance matrix of returns under P , and F δ , ¯ r ( N ) = { φ : φ (cid:62) = P ∈ U δ ( P N ) E P ( φ (cid:62) R ) ≥ ¯ r } denotesthe feasible region for portfolios. Using Lagrangian duality techniques (similar to this paper) the authors show that this primalproblem is equivalent to the convex dual problemmin φ ∈ F δ , ¯ r ( N ) (cid:18)(cid:113) φ (cid:62) Var P N ( R ) φ + √ δ (cid:107) φ (cid:107) p (cid:19) (6)in terms of optimal value and solution(s), with 1 / p + / q =

1. Following the approach in the previous subsection, let us assumethe reference measure P N is arbitrage-free and we have chosen δ α < min ( δ ∗ w , δ ∗ s ) . Again δ α denotes the radius of a Wassersteinball of probability measures that allow statistical arbitrage (up to some level α <

1) but not classical arbitrage. It follows fromTheorem 2.2 that the arbitrage-free probability measure constraint is not needed hence the arbitrage-free primal problemmin φ ∈ F δα , ¯ r ( N ) max P ∈ ˜ U δα ( P N ) { φ (cid:62) Var P ( R ) φ } (7)where ˜ U δ α ( P N ) = U δ α ( P N ) ∩ { P : sup φ ∈{ Γ w ∪ Γ s } E P [ { φ · S T ≥ } ] < } is equivalent to the primal and dual problems above. Inthis setting R = R ( S , S T ) is the random vector of asset returns calculated based on initial asset prices S ∈ R d and terminalasset prices S T ∈ R d . This subsection is a reﬁnement of the result(s) for robust mean risk portfolio optimization given in Esfahani and Kuhn(2018). For clarity, we adopt the notation and problem setup of that paper. Let ξ ∈ R m denote a random vector of (gross) assetreturns and x ∈ X denote a vector of portfolio percentage weights ranging over the probability simplex X = { x ∈ R m + : ∑ mi = x i = } . Thus we consider a “long only” portfolio. However, the reader is advised that today’s market includes securities such asexchange traded funds (ETFs) that behave like short positions hence the long portfolio setting is not as restrictive as it mightseem at ﬁrst glance. The portfolio return is given by (cid:104) x , ξ (cid:105) . A single stage stochastic program which minimizes a weighted sumof the mean and CVaR of portfolio loss at conﬁdence level ¯ α ∈ ( , ] , given the investor’s risk aversion ρ ∈ R + and distribution P is given by J ∗ = inf x ∈ X E P [ −(cid:104) x , ξ (cid:105) + ρ CVaR ¯ α ( −(cid:104) x , ξ (cid:105) )] . (8)Substituting the formal deﬁnition of CVaR into the above, they show that J ∗ = inf x ∈ X E P [ −(cid:104) x , ξ (cid:105) ] + ρ inf τ ∈ R E P [ τ + ( / ¯ α ) max ( −(cid:104) x , ξ (cid:105) − τ , )] (9) = inf x ∈ X , τ ∈ R E P [ max k ≤ K a k (cid:104) x , ξ (cid:105) + b k τ ] (10)where K = , a = − , a = − − ( ρ / ¯ α ) , b = ρ , b = ρ ( − ( / ¯ α )) .For Wasserstein ambiguity set B ε ( ˆ P N ) of radius ε about reference measure ˆ P N , the authors formulate the distributionallyrobust primal problem ˆ J N ( ε ) : = inf x ∈ X sup Q ∈ B ε ( ˆ P N ) E Q [ −(cid:104) x , ξ (cid:105) + ρ CVaR ¯ α ( −(cid:104) x , ξ (cid:105) )] (11)Applying techniques of Lagrangian duality, Esfahani and Kuhn formulate the equivalent dual problemˆ J N ( ε ) =  inf x , τ , λ , s i , γ ik λ ε + N ∑ Ni = s i such that x ∈ X , b k τ + a k (cid:104) x , ˆ ξ i (cid:105) + (cid:104) γ ik , d − C ˆ ξ i (cid:105) ≤ s i , (cid:107) C (cid:62) γ ik − a k x (cid:107) ∗ ≤ λ , γ ik ≥ i ≤ N , ∀ k ≤ K . Following the approach in the previous subsection, let us assume the reference measure ˆ P N is arbitrage-freeand we have chosen ε α < min ( ε ∗ w , ε ∗ s ) . As before ε α denotes the radius of a Wasserstein ball of probability measures that allowstatistical arbitrage (up to some level α <

1) but not classical arbitrage. It follows from Theorem 2.2 that the arbitrage-freeprobability measure constraint is not needed hence the arbitrage-free primal probleminf x ∈ X sup Q ∈ ˜ B εα ( ˆ P N ) E Q [ −(cid:104) x , ξ (cid:105) + ρ CVaR ¯ α ( −(cid:104) x , ξ (cid:105) )] (13)where ˜ B ε α ( ˆ P N ) = B ε α ( ˆ P N ) ∩ { Q : sup φ ( x ) ∈{ Γ w ∪ Γ s } E Q [ { φ ( x ) · ˜ S T ≥ } ] < } is equivalent to the primal and dual problems above.In this setting, ξ = ξ ( S , S T ) is the random vector of asset returns calculated based on initial asset prices S ∈ R m and terminalasset prices S T ∈ R m . Also, ˜ S = ( S , B ) appends the initial cash bond (borrowing) B used to purchase the portfolio (at zeroor negative cost) and ˜ S T = ( S T , B T ) appends the bond repayment (principal plus interest) at the end of the investment period.Finally, φ ( x ) ∈ R m + for x ∈ X denotes the portfolio weight vector corresponding to the portfolio purchase and cash loan. Byconstruction, the ﬁrst m components of φ are non-negative whereas the last component has a negative sign. This section gives formal proofs for the complexity of the No-Arbitrage Problem. We establish that the weak and strongno-arbitrage problems are NP Hard. The approach taken here is to use reduction on the known NP complete closed (open)hemisphere decision problem (Johnson and Preparata, 1978). The optimization problem, using the notation of this paper, isstated below (Avis et al., 2005).1. closed hemisphere: Find w ∈ R n such that card ( { i : s i ∈ S ; w · s i ≥ } ) is maximized.2. open hemisphere: Find w ∈ R n such that card ( { i : s i ∈ S ; w · s i > } ) is maximized.To complete the problem statement, note that the set S above denotes a ﬁnite subset of Q n containing N points. It followsthat the mixed hemisphere problem (where c i ≥ ∀ i ) is also NP complete.3. mixed hemisphere: sup w ∈ R n (cid:2) N ∑ i = { w · s i ≥ } + N ∑ i = c i { w · s i < } (cid:3) . (M1)One can write a simpliﬁed version of the weak and strong no-arbitrage optimization problems as follows (see Section 2.1.3).To construct these simpliﬁed versions, we have ﬁxed λ ∗ to a constant, relabeled [ − λ c i ] + as c i , and omitted the initial costconstraint w · S = κ . Recall κ is zero in the weak case, but strictly less than zero (for arbitrary κ ) in the strong case.sup w ∈ R n F ( w ) : = (cid:2) N ∑ i = { w · s i ≥ } + N ∑ i = c i { w · s i < } (cid:3) . (WD3,SD3)However, there is some work to be done to incorporate the initial cost constraint back to formulate the no-arbitrage problems.First, think of the unconstrained initial cost as the union of three possibilities: (i) w · S <

0, (ii) w · S =

0, (iii) w · S >

0. Somethought suggests that the following proposition holds.

Proposition 6.

Asumming P (cid:54) = NP , there can be no polynomial time algorithm to solve the simpliﬁed no-arbitrage problemunder initial cost constraint w · S ≤ κ for κ ∈ R .Proof. Proceed by contradiction. Suppose there is a polynomial time algorithm A that can solve the following problem:sup w ∈ R n , w · S ≤ κ F ( w ) . (M2)Exploiting symmetry, one can then also use algorithm A to solve this problem:sup w ∈ R n , w · S ≥ κ F ( w ) . (M3)Returning the better answer now gives us a polynomial time algorithm to solve the mixed hemisphere problem which contradicts P (cid:54) = NP . Hence it must be that there is no polynomial time algorithm A to solve either M2 or M3. orollary 6.1. Asumming P (cid:54) = NP , there can be no polynomial time algorithms to solve both the weak and strong no-arbitrageproblems.Proof. This follows directly from the deﬁnitions of the weak and strong no-arbitrage conditions (see Section 1.3.1).

Corollary 6.2.

Asumming P (cid:54) = NP , there can be no polynomial time algorithms to solve either the weak or strong no-arbitrageproblems.Proof. Recall that weight vectors w satisfy the homogeneity property. Hence the optimal solution to the strong no-arbitrageproblem is invariant to the actual choice of κ up to the sign. In other words, we have the following relation:sup w ∈ R n , w · S < F ( w ) = sup w ∈ R n , w · S = κ < ( κ arbitrary ) F ( w ) . (M4)Furthermore, the RHS formulation above is equivalent in form to the weak no-arbitrage problem. This computational study uses the Matlab fminimax and fmincon solvers to work out a couple of concrete examples to ﬁndthe critical radii at the cusp of (statistical) arbitrage assuming short sales are allowed . Best (worst) case distributions and optimalportfolios are computed as well. Suitable values (for the problem instances below) for M range from 100 to 10,000 and for ε from 0.001 to 0.0001. Other choices may be suitable. Recall that Matlab fminimax solves to local optimality using a sequentialquadratic programming (SQP) method with modiﬁcations (Fletcher, 2010). Similarly, fmincon solves to local optimality usinggradient based techniques. Our algorithm incorporates a few additional features to improve the robustness of the approach.These are listed next.1. multi search: multiple search paths (that evolve candidate solutions) are used, similar to a genetic algorithm.2. hot start: the optimal portfolio from the previous run δ prev becomes the initial portfolio for the next run δ next .3. function smoothing: the indicator function can be relaxed using a sigmoid with appropriate scale factor.As mentioned in Section 2, developing an approach to solve for global optimality would be a topic for further research. Mean-while, for this section, the computed values for v w ( s ) and corresponding critical values for δ ∗ w ( s ) represent local optimality (upperbounds for globally optimal δ ∗ w ( s ) ). This comment also applies for the statistical arbitrage calculations for δ bc α and δ wc α . For the ﬁrst example, consider the simple setting of a one-period binomial tree asset pricing model. There is a riskless bondpriced at par at time zero that earns a deterministic risk free rate of return r at time 1. In addition there is a risky asset (stock)with initial price s and time 1 price s u = us that occurs with probability p = / s d = ds that occurs with probability q = − p = /

2. The (weak) no-arbitrage conditions can be stated as: 0 < d < + r < u (Shreve, 2005). Let us mock up anexample to satisfy these conditions. Consider the problem setting below. Here 0 < d = . ... < + r ∈ { . , . } < u = . ... thus the conditions are satisﬁed. Intuitively the investor could either make or lose money depending on what happens.Solving NLP N WNA (see Theorem 2.1) for various values of δ gives the results in Table 2 (including the optimal port-folios). The critical radius δ ∗ w from Theorem 2.2 is at most δ gives the results in Table 3. The critical radius δ ∗ s is at most Table 2: v w ( δ ) <

1: Weak No-Arbitrage Condition δ v w w stock w bond -3.9 2.1 2.1 2.1 2.1 2.1 1.519igure 1: One-Period Binomial TreeStock =$310Bond =$100.5Stock = $300Bond = $100 Stock =$290Bond =$99.5 p q = ( − p ) Table 3: v s ( δ ) <

1: Strong No-Arbitrage Condition δ v s w stock w bond -3.9 -565 -565 -565 899 899 247Figure 2: Arbitrage Probabilities for One-Period Binomial Asset Pricing0 0 . . . . . . P r ob a b ilit y . . . . P r ob a b ilit y StrongWeak

A typical example of a pairs trade would be to trade a linear combination of cointegrated tickers. The idea is to exploittemporary divergence from the long run relationship in the belief that convergence to the long run mean will result in a proﬁtabletrading strategy (Wojcik, 2005). The following annual data set of month end closing prices is taken from Yahoo ﬁnance website.A plot of this market data is shown in Figure 3 below. , , , , , ,

500 Month C l o s i ng P r i ce s , , , , , , C l o s i ng P r i ce s AmazonGoogle

Solving NLP N SNA for various values of δ gives the results in Table 6. A plot of these values is shown in Figure 4 below.The entire 12 point data set is used as the support for the time 1 distribution. The arithmetic average is used for the time 0prices. The data tuples of closing prices are assigned to the (uniform) discrete distribution for time 1. Table 6: v s ( δ ) : SA Best Case δ v s w google w amazon -6.9 -67.5 -67.5 -67.5 -67.5 -67.5 -67.5 -67.5 A plot of the best case (bc) distribution is shown in Figure 5 below. Recall the robust (strong) no-arbitrage conditions aresup w ∈ Γ w ( s ) sup Q ∈ U δ ( Q N ) E Q [ { w · S ≥ } ] < . The best case distribution has the property that the inner sup evaluates to 1 for δ ≥ δ ∗ = . w ∗ = { . , − . } from Table 6, corresponding to δ = δ ∗ , the outer sup also evaluates . . . . P r ob a b ilit y Best Case to 1. Using the greedy algorithm discussed in Section 2.2 one recovers an arbitrage distribution. From the plot in Figure 5 it isclear that Google dominates Amazon which allows for the proﬁt making opportunity.Also from Table 6 one case see that for α = .

99 the critical radius is δ bc α =

31. It turns out that point 3 is the most expensiveto move towards the arbitrage conditions, as Amazon dominates Google here (instead of the other way around). Moving 95% ofits mass towards the new values (and using the arbitrage admissible distribution for the remaining points) recovers the statisticalarbitrage distribution for α = .

99. See Table 7 for the detailed probability mass function (PMF) for α = .

99. Recall that onepoint mass from the reference distribution can be split into two pieces according to the budget constraint δ . In this case, thishappens for point 3. 95% of its mass is moved towards the new values in point 13. Figure 5: U.S. Tech Pair Best Case Distribution0 2 4 6 8 10 121 , , , , ,

500 Month C l o s i ng P r i ce s , , , , , C l o s i ng P r i ce s AmazonGoogle 22able 7: Best Case PMF for α = . Switching to the worst case SA conditions, solving NLP N SSA for various values of δ gives the results in Table 8. A plotof these values is shown in Figure 6 below. The problem setup is the same as for the best case SA conditions above. A plot of theworst case (wc) distribution Q α w α for α = w α = { . , − . } .These results were calculated using the Matlab fmincon solver, aplpying a grid search over λ to solve the maximax problem(which is convex in λ ). Using the greedy algorithm discussed in Section 2.2 one recovers a no-win distribution. From theplot in Figure 7 it is clear that neither Amazon nor

Google dominates at all points but for the optimal portfolio w α , a quickcheck veriﬁes that this distribution leads to a no-win situation, meaning E Q α w α [ { w α · S ≥ } ] =

0. Our calculations show that for α = δ wc α = .

4. See Table 9 for the detailed probability mass function (PMF). Finally, Figure 8 showsthe absolute values of these two positions in the optimal portfolio with weights w α . Here the dominance of the short Amazonposition is easier to see. Table 8: v wcs ( δ ) : SA Worst Case δ v wcs w google w amazon -1.98 -67.44 -67.48 -66.84 -66.84 -67.02 -67.02 -1.98Figure 6: Arbitrage Probabilities for U.S. Tech Pair0 10 20 3000 . . . .

81 Delta P r ob a b ilit y Worst Case23igure 7: U.S. Tech Pair Worst Case Distribution0 2 4 6 8 10 121 , , , , ,

400 Month C l o s i ng P r i ce s , , , , , C l o s i ng P r i ce s AmazonGoogleTable 9: Worst Case PMF for α = . . . . · Month P o r t f o li o P o s iti on s . . . . · Month P o r t f o li o P o s iti on s AmazonGoogle 24 .3 Basket Trading

Basket trading involves simultaneous trading of a basket of stocks. This example computes the critical radius for a smallbasket of U.S. equities from the S&P 500 index used in the statistical arbitrage study by (Zhao et al., 2018). Table 10 belowlists the stock tickers, names, and industries. Table 11 displays a partial listing of the 5y historical market data set from March2015 through March 2020 used in this study. As before, the arithmetic average is used for time 0 and the data tuples for time1. Table 12 and Figure 9 display the optimal portfolios and best case arbitrage probabilities. Figures 10 and 11 show differentviews of the best case distribution for α = w α = { . , − . , . , − . , − . , . , − . } .The quantiles in Figure 11b are { . , . , . } respectively. Table 10: Basket ConstituentsTicker Name Industry Market Cap (bn)APA Apache Corporation Energy: Oil and Gas 10.68AXP American Express Company Credit Services 109.0CAT Caterpillar Inc. Farm Machinery 74.94COF Capital One Financial Corp. Credit Services 46.19FCX Freeport-McMoRan Inc. Copper 17.33IBM 1nternational Business Machines Corp. Technology 132.70MMM 3M Company Industrial Machinery 90.33Table 11: Basket 2019 Market DataDate 06/01 07/01 08/01 09/01 10/01 11/01 12/01APA 28.13 23.71 21.17 25.12 21.25 22.11 25.39AXP 122.16 123.08 119.50 117.42 116.43 119.71 124.06CAT 133.26 128.74 117.25 124.45 135.77 143.72 146.65COF 89.62 91.28 85.56 90.26 92.51 99.22 102.51FCX 11.45 10.91 9.10 9.48 9.73 11.34 13.07IBM 133.31 143.31 131.02 142.24 130.80 131.51 132.65MMM 168.77 170.11 157.45 161.53 162.11 166.80 174.84Table 12: v s ( δ ) : SA Best Case δ v s w apa w axp -3.90 -7.50 -6.44 -14.45 -4.75 -0.24 w cat -3.03 2.69 -0.94 5.54 6.36 10.90 w co f w f cx w ibm -7.12 -6.56 -5.83 3.00 0.68 2.07 w mmm . . . . . . . . P r ob a b ilit y Best Case

Figure 10: Correlation Matrix for Equity Basket BC DistributionFigure 11: Equity Basket BC Distribution (a) Parallel Coords (b) Quantiles witching to the worst case SA conditions, solving NLP N SSA for various values of δ gives the results in Table 13. Aplot of these values is shown in Figure 12 below. The problem setup is the same as for the best case SA conditions above.Note that the critical value δ wc α = = . δ bc α = =

1. The reference distribution ismuch closer to admitting arbitrage than admitting a no-win situation. Figures 13 and 14 show different views of the worst casedistribution Q α w α for α = w α = { . , . , − . , . , . , − . , − . } . The quantiles in Figure14b are { . , . , . } respectively. Table 13: v wcs ( δ ) : SA Worst Case δ v wcs w apa w axp -43.41 -948.06 –895.60 379.42 353.38 -540.94 9.97 w cat -40.50 117.04 -321.50 975.81 973.14 -994.01 -6.84 w co f w f cx w ibm -46.11 -523.48 143.70 -834.35 -810.86 820.05 -6.54 w mmm . . . .

81 Delta P r ob a b ilit y Worst Case (a) Parallel Coords (b) Quantiles

As another example, let us consider a basket of stock indices, taken from the Market Watch ﬁnancial website. In particular,we look at broad based equity indices ( Dow Jones 30, S&P 500 ), the Nasdaq technology stock index ( IXIC ), the USO oilexchange traded fund (ETF), and a gold ETF (SGOL). Table 14 displays a partial listing of the historical data set from March2015 to March 2020 used in this study. As before, the arithmetic average is used for time 0 and the data tuples for time 1.

Table 14: Basket 2019 Market DataDate 06/01 07/01 08/01 09/01 10/01 11/01 12/01DJI 26,600 26,864 26,403 26,917 27,046 28,051 28,538GSPC 2,942 2,980 2,926 2,977 3,038 3,141 3,231IXIC 8,006 8,175 7,963 7,999 8,292 8,665 8,973USO 12.04 12.04 11.46 11.34 11.30 11.62 12.81SGOL 13.60 13.61 14.69 14.20 14.56 14.16 14.62

Solving NLP N SNA for various values of δ gives the results in Table 15 below. The arbitrage probability curve is plotted inFigure 15. Different views of the best case distribution for α = w α = {− . , − . , . , . , . } are shown in Figures 16 and 17. v s ( δ ) : SA Best Case δ v s w d ji w gspc -0.04 -0.77 2.73 2.54 -11.65 -5.13 w ixic -2.70 -4.97 -5.45 -3.21 0.08 1.61 w uso -6.01 4.42 105.07 21.27 -392.33 179.35 w sgol -0.01 -17.15 -232.03 -334.96 1,000.00 290.23Figure 15: Arbitrage Probabilities for Basket of Indices . . . . . . . . . . P r ob a b ilit y Best Case

Figure 16: Correlation Matrix for Indices BC Distribution29igure 17: Indices BC Distribution (a) Parallel Coords (b) Quantiles

Switching to the worst case gives the results in Table 16 below. The arbitrage probability curve is plotted in Figure 18. Notethat the critical value δ wc α = = . δ bc α = = .

6. As before, the reference distributionis much closer to admitting arbitrage than admitting a no-win situation. Different views of the worst case distribution for α = w α = { . , . , − . , . , . } are shown in Figures 19 and 20. Table 16: v wcs ( δ ) : SA Worst Case δ v wcs w d ji w gspc -71.35 -83.14 -53.80 -18.59 -71.69 -98.82 8.42 w ixic -349.21 -597.31 -464.08 -160.00 -616.62 -848.45 -9.52 w uso -55.20 29.16 16.30 5.79 14.07 46.47 9.00 w sgol . . . .

81 Delta P r ob a b ilit y Worst Case (a) Parallel Coords (b) Quantiles

This subsection looks at a couple of concrete examples for the nearest NA problem discussed in Section 2.5. In particular,short sales are allowed so we consider the problem setting of Section 2.5.1. The ﬁrst example is a simple one-period binomialtree asset pricing model. The second is a one-period pairs trading example using the Russell 2000 small-cap index and the S&P500 index. The third example looks at basket trading using the index basket from Section 6.3.

For this example, we again consider the simple setting of a one-period binomial tree asset pricing model. There is a risklessbond priced at par at time zero that earns a deterministic risk free rate of return r at time 1. In addition there is a risky asset(stock) with initial price s and time 1 price s u = us that occurs with probability p = / s d = ds that occurs withprobability q = − p = /

2. The (weak) no-arbitrage conditions can be stated as: 0 < d < + r < u (Shreve, 2005). Let usmock up an example to violate this. Consider the problem setting below. Here 0 < + r = . < d = u = . ... thus theconditions are violated. Intuitively the investor could always make money by going long the stock and borrowing via the bond.Solving the penalty relaxation problem NSPR using Neos / Baron nonlinear programming (NLP) solver (Byrd et al., 2006)for a set of values for β gives the results in Table 17. Using a subgradient method we ﬁnd the solution to the tight relaxationproblem NSPRT to be δ ∗ nst ≈ . X ∗ and q ∗ are shown as well. Calculations show that p = (cid:20) (cid:21) ∧ X = (cid:20)

304 304101 101 (cid:21) = ⇒ ˜ X ∗ = (cid:20) . . . . (cid:21) ∧ q ∗ = (cid:20) . . (cid:21) = ⇒ (cid:107) p − ˜ X ∗ q ∗ (cid:107) = . − p q = ( − p ) Table 17: Min Distance to Arbitrage-Free Measure β δ ∗ nsr For the complete markets problem, using the Neos / Knitro solver, p = (cid:20) (cid:21) ∧ X = (cid:20)

304 304101 101 (cid:21) = ⇒ ˜ X ∗ = (cid:20) . . . . (cid:21) ∧ q ∗ = (cid:20) . (cid:21) = ⇒ (cid:107) p − ˜ X ∗ q ∗ (cid:107) = . − δ ∗ cns ≈ . p = (cid:20) (cid:21) ∧ X = (cid:20)

309 306101 101 (cid:21) = ⇒ ˜ X ∗ = (cid:20)

309 305 . . . (cid:21) ∧ q ∗ = (cid:20) . (cid:21) = ⇒ (cid:107) p − ˜ X ∗ q ∗ (cid:107) = . − δ ∗ nst ≈ . This example uses the Russell 2000 and S&P 500 indices to conduct pairs trading on an annual data set of month end closingprices from the Yahoo website, as shown in Tables 18 and 19. A plot of this market data is shown in Figure 22. To satisfythe (strong) arbitrage conditions, an initial asset price vector S = { , , , } is selected. The portfolio w ∗ = {− . , . } satisﬁes the (strong) arbitrage condition, for time 1 asset price vector S following a uniform discrete distribution with theannual data set as its support. Converting to the nearest NA problem setting of p = Xq , this support is used as the scenariomatrix X and the initial asset price vector S is used as the price vector p . Table 18: Russell 2k and S&P 500 Market Data 2019Date 04/01 05/01 06/01 07/01 08/01 09/01Russell 2k 1,591 1,466 1,567 1,577 1,495 1,523S&P 500 2,946 2,752 2,942 2,980 2,926 2,977

Solving the penalty relaxation problem NSPR using Neos / Knitro nonlinear programming (NLP) solver (Byrd et al., 2006)for a set of values for β gives the results in Table 20. Using a subgradient method we ﬁnd the solution to the tight relaxationproblem NSPRT to be δ ∗ nst ≈ .

36. The corresponding values for ˜ X ∗ and q ∗ are shown as well. , , ,

600 Month P o r t f o li o P o s iti on s , , , , P o r t f o li o P o s iti on s SP500Russell2kTable 20: Min Distance to Arbitrage-Free Measure β δ ∗ nsr p = (cid:20) , , (cid:21) ∧ X = (cid:20) ,

591 1 ,

466 1 ,

567 1 ,

577 1 ,

495 1 ,

523 1 ,

562 1 ,

625 1 ,

668 1 ,

614 1 ,

476 1 , ,

946 2 ,

752 2 ,

942 2 ,

980 2 ,

926 2 ,

977 3 ,

038 3 ,

141 3 ,

230 3 ,

226 2 ,

954 2 , (cid:21) = ⇒ ˜ X ∗ = (cid:20) , .

29 1 ,

466 1 ,

567 1 ,

577 1 ,

495 1 ,

523 1 ,

562 1 ,

625 1 ,

668 1 ,

614 1 ,

476 1 , , .

13 2 ,

752 2 ,

942 2 ,

980 2 ,

926 2 ,

977 3 ,

038 3 ,

141 3 ,

230 3 ,

226 2 ,

954 2 , (cid:21) ∧ q ∗ = (cid:2) . (cid:3) = ⇒ (cid:107) p − ˜ X ∗ q ∗ (cid:107) = . − . This example uses the index basket from Section 6.3 to conduct trading. The reference data set is the 2019 month endclosing prices from the Yahoo website, as shown in Tables 21 and 22. A plot of this market data is shown in Figure 23. Tosatisfy the (strong) arbitrage conditions, an initial asset price vector S = { , , , . , . } is selected. Theportfolio w ∗ = { . , . , − . , − . , − . } satisﬁes the (strong) arbitrage condition, for time 1 asset price vector S following a uniform discrete distribution with the annual data set as its support. Converting to the nearest NA problem settingof p = Xq , this support is used as the scenario matrix X and the initial asset price vector S is used as the price vector p . (a) Parallel Coords (b) Quantiles Solving the penalty relaxation problem NSPR using Neos / Knitro nonlinear programming (NLP) solver (Byrd et al., 2006)for a set of values for β gives the results in Table 23. Using a subgradient method we ﬁnd the solution to the tight relaxationproblem NSPRT to be δ ∗ nst ≈ .

07. The corresponding values for ˜ X ∗ and q ∗ are shown as well. Table 23: Min Distance to Arbitrage-Free Measure β δ ∗ nsr =  , , , . .  ∧ X =  ,

000 25 ,

916 25 ,

929 26 ,

593 24 ,

815 26 ,

600 26 ,

864 264 ,

03 26 ,

917 27 ,

046 28 ,

051 28 , ,

704 2 ,

785 2 ,

834 2 ,

946 2 ,

752 2 ,

942 2 ,

980 2 ,

926 2 ,

977 3 ,

038 3 ,

141 3 , ,

282 7 ,

533 7 ,

729 8 ,

095 7 ,

453 8 ,

006 8 ,

175 7 ,

963 7 ,

999 8 ,

292 8 ,

665 8 , .

35 11 .

95 12 .

50 13 .

29 11 .

10 12 .

04 12 .

04 11 .

46 11 .

34 11 .

30 11 .

62 12 . .

73 12 .

65 12 .

46 12 .

37 12 .

59 13 .

60 13 .

61 14 .

69 14 .

20 14 .

56 14 .

16 14 .  = ⇒ ˜ X ∗ =  ,

000 25 , .

88 25 ,

929 26 ,

593 24 ,

815 26 ,

600 26 ,

864 264 ,

03 26 ,

917 27 ,

046 28 ,

051 28 , . ,

704 2 , .

19 2 ,

834 2 ,

946 2 ,

752 2 ,

942 2 ,

980 2 ,

926 2 ,

977 3 ,

038 3 ,

141 3 , . ,

282 7 , .

30 7 ,

729 8 ,

095 7 ,

453 8 ,

006 8 ,

175 7 ,

963 7 ,

999 8 ,

292 8 ,

665 8 , . .

35 12 .

51 12 .

50 13 .

29 11 .

10 12 .

04 12 .

04 11 .

46 11 .

34 11 .

30 11 .

62 14 . .

73 12 .

56 12 .

46 12 .

37 12 .

59 13 .

60 13 .

61 14 .

69 14 .

20 14 .

56 14 .

16 14 .  ∧ q ∗ = (cid:2) . . (cid:3) = ⇒ (cid:107) p − ˜ X ∗ q ∗ (cid:107) = . − . This work has developed theoretical results and investigated calculations of robust arbitrage-free markets under distri-butional uncertainty using Wasserstein distance as an ambiguity measure. The ﬁnancial market overview and foundationalnotation and problem deﬁnitions were introduced in Section 1. Using recent duality results (Blanchet and Murthy, 2019), thesimpler dual formulation and its mixture of analytic and computational solutions were derived in Section 2. In Section 3 therobust arbitrage methodology was extended to encompass statistical arbitrage. In Section 4, some applications to robust optionpricing and portfolio selection were presented. Section 5 gave formal proofs for the NP Hardness of the NA problem. In Section6, we performed a computational study to calculate the critical radii (for the arbitrage conditions), optimal portfolios, and best(worst) case distributions for some concrete examples. The examples included a simple binomial tree, a pairs trading data set,and two trading baskets. The nearest NA problem was also explored to complete the study. Finally, we conclude with somecommentary on directions for further research.One direction for future research, as has been previously discussed in Section 1.4.2, would be to investigate robust arbitrageproperties in a multi period continuous time setting for a suitable class of admissible trading strategies. Recall that a moregeneral version of the fundamental theorem of asset pricing applies there. Additional detail on this topic can be found inDelbaen and Schachermayer (2006). Another direction for future research, as mentioned in Section 2, would be to develop(and apply) a global solution strategy for the NLP problem formulations of Section 2.1.3. One possibility (as mentioned) isto construct an MINLP problem formulation, in programming languages such as GAMS, that is solvable to global optimalityusing the Baron solver, for example. Perhaps a third direction for future research would be to investigate notions of robust(modern) portfolio theory applying and/or extending the framework developed thus far.

Data Availability Statement

The raw and/or processed data required to reproduce the ﬁndings from this research can be obtained from the correspondingauthor, [D.S.], upon reasonable request.

Conﬂict of Interest Statement

The authors declare they have no conﬂict of interest.

Funding Statement

The authors received no speciﬁc funding for this work. eferences Andrei, N. and Andrei, N. (2013).

Nonlinear optimization applications using the GAMS technology . Springer.Avis, D., Hertz, A., and Marcotte, O. (2005).

Graph theory and combinatorial optimization , volume 8. Springer Science &Business Media.Bartl, D., Drapeau, S., and Tangpi, L. (2017). Computational aspects of robust optimized certainty equivalents and optionpricing.Bartl, D. et al. (2019). Exponential utility maximization under model uncertainty for unbounded endowments.

The Annals ofApplied Probability , 29(1):577–612.Blanchet, J., Chen, L., and Zhou, X. Y. (2018). Distributionally robust mean-variance portfolio selection with wassersteindistances.Blanchet, J., Kang, Y., and Murthy, K. (2016). Robust wasserstein proﬁle inference and applications to machine learning. arXivpreprint arXiv:1610.05627 .Blanchet, J. and Murthy, K. (2019). Quantifying distributional model risk via optimal transport.

Mathematics of OperationsResearch , 44(2):565–600.Byrd, R., Nocedal, J., and KNITRO, R. W. (2006). An integrated package for nonlinear optimization.Calaﬁore, G. C. and El Ghaoui, L. (2014).

Optimization models . Cambridge university press.Cornuejols, G. and T¨ut¨unc¨u, R. (2018).

Optimization methods in ﬁnance . Cambridge University Press, 2 edition.Delbaen, F. and Schachermayer, W. (2006).

The mathematics of arbitrage . Springer Science & Business Media.Dinh, N., Goberna, M. A., L´opez, M., and Mo, T. H. (2017). Robust optimization revisited via robust vector farkas lemmas.

Optimization , 66(6):939–963.Esfahani, P. M. and Kuhn, D. (2018). Data-driven distributionally robust optimization using the wasserstein metric: Perfor-mance guarantees and tractable reformulations.

Mathematical Programming , 171(1-2):115–166.Fletcher, R. (2010). The sequential quadratic programming method. In

Nonlinear optimization , pages 165–214. Springer.Focardi, S. M., Fabozzi, F. J., and Mitov, I. K. (2016). A new approach to statistical arbitrage: Strategies based on dynamicfactor models of prices and their performance.

Journal of Banking & Finance , 65:134–155.F¨ollmer, H. and Schied, A. (2011).

Stochastic ﬁnance: an introduction in discrete time . Walter de Gruyter.Gao, R. and Kleywegt, A. J. (2016). Distributionally robust stochastic optimization with wasserstein distance. arXiv preprintarXiv:1604.02199 .Jeyakumar, V. and Li, G. (2011). Robust farkas lemma for uncertain linear systems with applications.

Positivity , 15(2):331–342.Johnson, D. S. and Preparata, F. P. (1978). The densest hemisphere problem.

Theoretical Computer Science , 6(1):93–107.Krauss, C. (2017). Statistical arbitrage pairs trading strategies: Review and outlook.

Journal of Economic Surveys , 31(2):513–545.Lazzarino, M., Berrill, J., ˇSevi´c, A., et al. (2018). What is statistical arbitrage?

Theoretical Economics Letters , 8(05):888.LeRoy, S. F. and Werner, J. (2014).

Principles of ﬁnancial economics . Cambridge University Press.Markowitz, H. (1952). Portfolio selection.

The journal of ﬁnance , 7(1):77–91.Oleaga, G. (2012). Arbitrage conditions with no short selling.

Bolet´ın de Matem´aticas , 19(1):37–54.Ostrovski, V. (2013). Stability of no-arbitrage property under model uncertainty.

Statistics & Probability Letters , 83(1):89–92.Ross, S. (1976). The arbitrage theory of capital asset pricing.

Journal of Economic Theory , 13(3):341–360. oss, S. A. et al. (1973). Return, risk and arbitrage . Rodney L. White Center for Financial Research, The Wharton School .Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk.

The journal of ﬁnance ,19(3):425–442.Shreve, S. (2005).

Stochastic calculus for ﬁnance I: the binomial asset pricing model . Springer Science & Business Media.Shreve, S. E. (2004).

Stochastic calculus for ﬁnance II: Continuous-time models , volume 11. Springer Science & BusinessMedia.Singh, D. and Zhang, S. (2019). Distributionally robust xva via wasserstein distance part 1. arXiv preprint arXiv:1910.01781 .Tawarmalani, M. and Sahinidis, N. V. (2005). A polyhedral branch-and-cut approach to global optimization.

MathematicalProgramming , 103:225–249.Villani, C. (2008).

Optimal transport: old and new , volume 338. Springer Science & Business Media.Wasserman, L. (2017). Optimal transport and wasserstein distance. .Accessed: 2020-03-15.Wojcik, R. (2005). Pairs trading: a professional approach.

PDF dated , 19.Xie, Y., Wang, X., Wang, R., and Zha, H. (2018). A fast proximal point method for computing wasserstein distance. arXivpreprint arXiv:1802.04307 .Zhao, Z., Zhou, R., Wang, Z., and Palomar, D. P. (2018). Optimal portfolio design for statistical arbitrage in ﬁnance. In , pages 801–805. IEEE., pages 801–805. IEEE.