Distributionally Robust Profit Opportunities
aa r X i v : . [ q -f i n . P M ] J un Distributionally Robust Profit Opportunities
Derek Singh a , ∗ , Shuzhong Zhang a a Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN 55455
A R T I C L E I N F O
Keywords :robust profit opportunitiesSharpe ratiodistributionally robust optimizationWasserstein distanceLagrangian duality
A B S T R A C T
This paper expands the notion of robust profit opportunities in financial markets to incorporate dis-tributional uncertainty using Wasserstein distance as the ambiguity measure. Financial markets withrisky and risk-free assets are considered. The infinite dimensional primal problems are formulated,leading to their simpler finite dimensional dual problems. A principal motivating question is how doesdistributional uncertainty help or hurt the robustness of the profit opportunity. Towards answering thisquestion, some theory is developed and computational experiments are conducted. Finally some openquestions and suggestions for future research are discussed.
1. Introduction and Overview
Modern financial markets cover a wide array of assetclasses including (but not limited to) stocks, bonds, loans,money market instruments, currencies and commodities, realestate, derivatives, and so on. The concept of a profit oppor-tunity (through favorable purchase and sale of securities) isas old as financial markets themselves. Various trading andinvestment strategies have been developed, using advancesin technology and quantitative methodologies, to identifyand monetize such profit opportunities in modern financialmarkets. Risk adjusted return is one class of performancemetrics used to evaluate the attractiveness of such opportu-nites. One well known example of this is the Sharpe Ratiowhich looks at the ratio of expected excess return to risk asmeasured by variance of the return. A modern revision ofthis uses a benchmark index to measure excess return and itsvariance [10]. In [4] the authors show the utility of this met-ric and its linkage to other risk metrics such as the Sortinoratio, Omega ratio, CVaR ratio, and others under a Q-radialdistributional assumption for returns.The notion of a robust profit opportunity (RPO) for riskyassets and its relation to the Sharpe Ratio were first intro-duced and discussed in [9]. The RPO can be seen as a relax-ation of the notion of an arbitrage opportunity towards oneof statistical arbitrage; a term referring to arbitrage that isstatistically likely but not certain to occur. The parameter 𝜃 , which measures the robustness of the profit opportunity,quantifies the number of standard deviations the asset returnscould drop and yet the investment would still break even orgenerate some profit.The purpose of this work is to extend the notion of anRPO to a setting that incorporates ambiguity about the un-derlying distribution of risky asset returns. This is done viathe framework of Wasserstein discrepancy between distribu-tions and the corresponding infinite dimensional Lagrangianduality results. The first steps are to define a notion of dis-tributionally robust profit opportunities and formulate a pri- ∗ Corresponding author
Email addresses: [email protected] (D. Singh); [email protected] (S.Zhang) mal problem that measures the effect of ambiguity in dis-tribution, as measured by 𝛿 , on the degree of robustness asmeasured by 𝜃 . With that in hand, next steps are to formu-late and solve the simpler finite dimensional dual problemsto quantify the lower and upper bounds for robustness 𝜃 asa function of ambiguity 𝛿 . An outline of this paper is as fol-lows. Section 1 gives on overview of the financial conceptsof profit opportunities and robustness as well as a literaturereview. Section 2 develops the main theoretical results tocharacterize robust profit opportunities for financial marketswith risky and risk free assets. Section 3 conducts a casestudy of distributionally robust profit opportunities using afive year historical data set of month end closing prices for abasket of exchange traded funds (ETFs) spread across differ-ent sectors of the economy. Section 4 discusses conclusionsand suggestions for further research. All detailed proofs aredeferred to the Appendix. In conducting the literature review for this research, notmany references were found that have investigated the topicof statistical arbitrage under distributional uncertainty. FromSection 1.1 above, one can see that considerable researchhas been done in academic (and industry) circles regard-ing the classical notions of statistical arbitrage in financialmarkets. Indeed, several academic papers and financial text-books have been written that cover these topics from theirorigin in the 1980s until today. It was surprising to us, atleast, to find only a few papers that address and/or extendthe classical notions of statistical arbitrage under the pres-ence of some form of distributional uncertainty. This sub-section gives an overview of what we found in the academicliterature.One seminal paper of note by Ostrovskii [8] introducedthe notion of robust arbitrage under distributional uncertainty.Ostrovskii used the total variation (TV) metric to character-ize a radius 𝛿 𝑇 𝑉 such that all probability measures 𝑄 ′ withinthis distance from a weak arbitrage free reference measure 𝑄 are also weak arbitrage free. The author remarks that 𝛿 𝑇 𝑉 can be interpreted as the minimal probability of success thata zero cost initial portfolio 𝑤 ∈ ℝ 𝑛 achieves positive value 𝑤 ⋅ 𝑆 at time 1. The main result (and intermediate results) Singh et al.:
Preprint submitted to Elsevier
Page 1 of 8
RPOs relating 𝛿 𝑇 𝑉 to the minimal probability of success are estab-lished via proof by contradiction using tools from probabil-ity theory and real analysis. This work was extended in [11]to consider the Wasserstein metric and investigate a relaxednotion of classical arbitrage defined as statistical arbitrage.A recent paper, [14], investigated the behavior of reward-risk ratios, in particular the Sortino-Satchel and Stable TailAdjusted Return ratios (both modern variations of the Sharperatio), under distributional uncertainty in the Wassersteinframework. The authors provide tractable convex dual re-formulations of these infinite dimensional primal problemsusing recent results from [6] and [7]. The authors present analgorithm in detail to show how these tractable formulationscan be solved using the bisection method.In an earlier paper, [9], the authors introduced the notionof robust profit opportunities of degree 𝜃 which representinvestment strategies that still return profit after 𝜃 = 2 or 𝜃 = 3 standard deviations in adverse price movement for theunderlying risky securities. We have extended this notion toincorporate the concept of distributional ambiguity to con-duct our investigation of distributional RPOs. In some senseour work is an integration and advancement of the conceptsdeveloped in the previous two works, namely those of [9]and [14]. This concludes our overview of the academic lit-erature on notions of robust statistical arbitrage. This section lays out the notation and definitions used todevelop our framework to investigate distributionally robustprofit opportunities (DRPOs). The approach taken here is tostart with the definition of an RPO and introduce a notion ofdistributional uncertainty via the Wasserstein distance met-ric. As such, we include definitions for these terms as well assome commentary on the problem of moments duality resultused to formulate the dual problem for DRPOs.
Remark 1.
The units for portfolio weight vector 𝑤 are num-ber of shares of each security. The units for (random) secu-rity vector 𝑆 are the period 1 end values for security prices. The sets of admissible risky portfolio weights for theweak and strong RPO conditions are Γ 𝑟𝑤 ∶= { 𝑤 ∈ ℝ 𝑛 ∶ 𝑤 ⋅ 𝑆 ≤ 𝑤 ≠ , Γ 𝑟𝑠 ∶= { 𝑤 ∈ ℝ 𝑛 ∶ 𝑤 ⋅ 𝑆 < , where 𝑤 ⋅ 𝑆 denotes 𝑤 ⊤ 𝑆 . The RPO condition to be eval-uated for covariance matrix Σ for random vector 𝑆 underprobability measure 𝑄 for risky portfolios is 𝑤 ⋅ ̄𝑆 − 𝜃 √ 𝑤 ⊤ Σ 𝑤 ≥ . where ̄𝑆 = 𝔼 𝑄 [ 𝑆 ] and 𝜃 denotes the degree of robust-ness (or level of risk-aversion). Note that portfolio weightvectors 𝑤 satisfy the homogeneity property (of degree zero)since 𝑤 ⋅ ̄𝑆 − 𝜃 √ 𝑤 ⊤ Σ 𝑤 ≥ ⟹ 𝑤 𝑐 ⋅ ̄𝑆 − 𝜃 √ 𝑤 ⊤𝑐 Σ 𝑤 𝑐 ≥ for 𝑤 𝑐 = 𝑐𝑤 and 𝑐 > . It is the proportions of the holdingsin the assets that distinguish 𝑤 vectors, not their absolutesizes. For a given measure 𝑄 and Γ 𝑠 , no strong RPO (of level 𝜃 )means that sup 𝑤 ∈Γ 𝑠 𝑤 ⋅ ̄𝑆 − 𝜃 √ 𝑤 ⊤ Σ 𝑤 < . The empiricalmeasure, 𝑄 𝑁 , is defined as 𝑄 𝑁 ( 𝑑𝑧 ) = 𝑁 ∑ 𝑁𝑖 =1 𝑠 (1 ,𝑖 ) ( 𝑑𝑧 ) .To simplify the notation, the leading subscript on 𝑠 (1 ,𝑖 ) issuppressed and going forward we refer to the realization oftime 1 asset value vector 𝑠 (1 ,𝑖 ) as just 𝑠 𝑖 . In the context of thiswork, the uncertainty set for probability measures is 𝑈 𝛿 ( 𝑄 𝑁 )= { 𝑄 ∶ 𝐷 𝑐 ( 𝑄, 𝑄 𝑁 ) ≤ 𝛿 } where 𝐷 𝑐 is the optimal trans-port cost or Wasserstein discrepancy for cost function 𝑐 ( ) [1]. The definition for 𝐷 𝑐 is 𝐷 𝑐 ( 𝑄, 𝑄 ′ ) = inf{ 𝔼 𝜋 [ 𝑐 ( 𝐴, 𝐵 )] ∶ 𝜋 ∈ ( ℝ 𝑛 × ℝ 𝑛 ) , 𝜋 𝐴 = 𝑄, 𝜋 𝐵 = 𝑄 ′ } where denotes the space of Borel probability measuresand 𝜋 𝐴 and 𝜋 𝐵 denote the distributions of 𝐴 and 𝐵 . Here 𝐴 denotes 𝑆 𝐴 ∈ ℝ 𝑛 and 𝐵 denotes 𝑆 𝐵 ∈ ℝ 𝑛 respectively. Thiswork uses the cost function 𝑐 where 𝑐 ( 𝑢, 𝑣 ) = ‖ 𝑢 − 𝑣 ‖ = ⟨ 𝑢 − 𝑣, 𝑢 − 𝑣 ⟩ . The sets of admissible risky portfolio weightsfor the DRPO conditions (given a minimum target portfolioreturn 𝛼 ) are Γ 𝑑𝑤 ∶= { 𝑤 ∈ ℝ 𝑛 ∶ 𝑤 ⋅ 𝑆 ≤ 𝑤 ≠
0; min 𝑈 𝛿 ( 𝑄 𝑁 ) 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ] ≥ 𝛼 } , Γ 𝑑𝑠 ∶= { 𝑤 ∈ ℝ 𝑛 ∶ 𝑤 ⋅ 𝑆 <
0; min 𝑈 𝛿 ( 𝑄 𝑁 ) 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ] ≥ 𝛼 } . Using Proposition 1 in [1], these are equivalent to Γ 𝑑𝑤 ∶= { 𝑤 ∈ ℝ 𝑛 ∶ 𝑤 ⋅ 𝑆 ≤ 𝑤 ≠ 𝔼 𝑄 𝑁 [ 𝑤 ⋅ 𝑆 ] ≥ ̃𝛼 } , Γ 𝑑𝑠 ∶= { 𝑤 ∈ ℝ 𝑛 ∶ 𝑤 ⋅ 𝑆 < 𝔼 𝑄 𝑁 [ 𝑤 ⋅ 𝑆 ] ≥ ̃𝛼 } , where ̃𝛼 ∶= 𝛼 + √ 𝛿 ‖ 𝑤 ‖ . In our version of the problem weuse the relaxation ̃𝛼 ∶= 𝛼 which amounts to only requir-ing that the risky portfolio weights 𝑤 achieve the minimumtarget portfolio return 𝛼 for the empirical distribution 𝑄 𝑁 . In Section 2 we formulate the primal problems for DR-POs for financial markets with risky securities. A key stepin our approach is to use duality results to formulate the sim-pler yet equivalent dual problems. In this context, to enforcethe moment constraint 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ] = 𝛼 for 𝑄 ∈ 𝑈 𝛿 ( 𝑄 𝑁 ) ,we appeal to the strong duality of linear semi-infinite pro-grams. The dual problem appears to be more tractable thanthe primal problem since it only involves the (finite dimen-sional) reference probability measure 𝑄 𝑁 as opposed to acontinuum of probability measures. This allows us to solvea nested optimization problem under an empirical measuredefined by the chosen data set. A brief restatement of this du-ality result follows next. See Appendix B of [2] and Propo-sition 2 of [1] for further details. The problem of moments . Let 𝑋 be random vectorin probability space (Ω , , ) and (Ω , , + ) where and + denote the set of measures and non-negative measuresrespectively, such that Borel measurable functionals 𝜙, 𝑓 , … , 𝑓 𝑘 are integrable. Let 𝑓 = ( 𝑓 , … , 𝑓 𝑘 ) ∶ Ω → ℝ 𝑘 be a vectorof moment functionals. For a real valued vector 𝑞 ∈ ℝ 𝑘 , weare interested in the worst case bound 𝑣 ( 𝑞 ) ∶= sup ( 𝔼 𝜇 [ 𝜙 ( 𝑋 )] ∶ 𝔼 𝜇 [ 𝑓 ( 𝑋 )] = 𝑞 ; 𝜇 ∈ ) . Singh et al.:
Preprint submitted to Elsevier
Page 2 of 8RPOs
Adding a constant term by setting 𝑓 = Ω , the constraint 𝔼 𝜇 [ 𝑓 ( 𝑋 )] = 1 , and defining ̃𝑓 = ( 𝑓 , 𝑓 , … , 𝑓 𝑘 ) and ̃𝑞 =(1 , 𝑞 , … , 𝑞 𝑘 ) gives the following reformulation: 𝑣 ( 𝑞 ) ∶= sup ( ∫ 𝜙 ( 𝑥 ) 𝑑𝜇 ( 𝑥 ) ∶ ∫ ̃𝑓 ( 𝑥 ) 𝑑𝜇 ( 𝑥 ) = ̃𝑞 ; 𝜇 ∈ + ) . If a certain Slater condition is satistifed, one has the equiva-lent dual representation for the above:
Proposition.
Let ̃𝑓 = { ∫ ̃𝑓 ( 𝑥 ) 𝑑𝜇 ( 𝑥 ) ∶ 𝜇 ∈ + } . If ̃𝑞 isan interior point of ̃𝑓 then 𝑣 ( 𝑞 ) = inf ( 𝑘 ∑ 𝑖 =0 𝑎 𝑖 𝑞 𝑖 ∶ 𝑎 𝑖 ∈ ℝ ; 𝑘 ∑ 𝑖 =0 𝑎 𝑖 ̃𝑓 𝑖 ( 𝑥 ) ≥ 𝜙 ( 𝑥 ) ∀ 𝑥 ∈ Ω ) . The primal problem is concerned with the worst case ex-pected loss for some objective function 𝜙 , under momentconstraints. Note that the primal problem is an infinite di-mensional stochastic optimization problem and thus difficultto solve directly. The simplicity and tractability of the dualproblem make it quite attractive as an analytical and/or com-putational tool in our toolkit.The above duality result has been applied by Blanchetet. al and many other authors on topics in data driven dis-tributionally robust stochastic optimization such as robustmachine learning, portfolio selection, and risk management.For these types of robust optimization problems, the incor-poration of distributional uncertainty can be viewed as addinga penalty term (similar to penalized regression) to the opti-mal solution [1]. This gives us a nice intuitive way to thinkabout the cost of robustness.
2. Theory: DRPOs
This section develops the theory for DRPOs in financialmarkets with (only) risky assets. Extending the frameworkto handle markets with risk free assets is quite tractable; how-ever, it has been omitted due to space constraints. Let usfocus on the strong conditions (the weak conditions are sim-ilar). Both worst case and best case DRPO conditions aredeveloped. Section 2.1 deals with the worst case conditions,meaning that DRPOs of at least level 𝜃 𝑤𝑐 exist. The primalproblem is formulated using the notions discussed in Sec-tion 1.3.1. The dual problem is formulated using the prob-lem of moments duality result from Section 1.3.2. Note thatthe dual problem is a nested stochastic optimization prob-lem. The inner problem (evaluating Ψ 𝜆,𝑤 ) and middle prob-lem (evaluating the dual objective function over inf 𝜆 ≥ ,𝜆 )can be solved jointly using the techniques from Proposition3 in [1]. Finally, the outer optimization problem (evaluat-ing over inf 𝑤 ∈Γ 𝑑𝑠 ), for the strong case, can be formulated asa finite dimensional convex optimization problem. A simi-lar approach is taken in Section 2.2 for the best case condi-tions, meaning that DRPOs of at most level 𝜃 𝑏𝑐 exist. Thismachinery gives us a practical approach to explore applica-tions of our DRPO framework. Section 2.3 shows how toincorporate portfolio restrictions (such as short sales) in astraightforward manner. Remark 2.
For our problem setting, the covariance matrix Σ is assumed to be positive definite under the reference prob-ability measure 𝑄 𝑁 . Furthermore, the portfolio is assumedto consist of 𝑛 ≥ risky securities (excluding the risk-freesecurity), with short sales allowed. We extend the approach in [9] to arrive at these DRPOconditions. The authors define an RPO of degree 𝜃 as a port-folio 𝑤 ∈ ℝ 𝑛 that satisfies 𝑤 ⋅ ̄𝑆 − 𝜃 √ 𝑤 ⊤ Σ 𝑤 ≥ and 𝑤 ⋅ 𝑆 < . The authors comment that RPO is related to thenotions of risk-adjusted return and Sharpe ratio . The firstcondition is equivalent to 𝑤 ⋅ ̄𝑆 √ 𝑤 ⊤ Σ 𝑤 ≥ 𝜃 . Adding the normal-ization constraint 𝑤 ⋅ ̄𝑆 = 𝛼 for 𝛼 > and simplifyinggives 𝑤 ⊤ Σ 𝑤 ≤ 𝑔 𝛼 ( 𝜃 ) where 𝑔 𝛼 ( 𝜃 ) ∶= 𝛼 ∕ 𝜃 . Furthermore,the normalization 𝑤 ⋅ ̄𝑆 = 𝛼 ⟹ 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ] = 𝛼 . Forminimum target portfolio return 𝛼 , the strong worst case DRPO condition can be expressed as inf 𝑤 ∈Γ 𝑑𝑠 max 𝛼 ≥ 𝛼 sup 𝑄 ∈ 𝑈 𝛿 ( 𝑄 𝑁 ); 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ]= 𝛼 ( 𝑤 ⊤ Σ 𝑤 ) ≤ 𝑔 𝛼 ( 𝜃 𝑤𝑐 ) . (P wc ) Using Proposition 2 in [1] which invokes problem of mo-ments duality (see Section 1.3.2), the dual formulation forthe inner optimization problem sup 𝑄 ∈ 𝑈 𝛿 ( 𝑄 𝑁 ); 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ]= 𝛼 ( 𝑤 ⊤ 𝔼 𝑄 [ 𝑆 𝑆 ⊤ ] 𝑤 ) (I wc )where 𝑤 ⊤ Σ 𝑤 = ( 𝑤 ⊤ 𝔼 𝑄 [ 𝑆 𝑆 ⊤ ] 𝑤 ) − 𝛼 is inf 𝜆 ≥ ,𝜆 [ 𝜆 𝛿 + 𝜆 𝛼 + 1 𝑁 𝑁 ∑ 𝑖 =1 Ψ 𝑤𝑐𝜆,𝑤 ( 𝑠 𝑖 ) ] (D wc )where Ψ 𝑤𝑐𝜆,𝑤 is defined, in terms of cost function 𝑐 ( ) , as Ψ 𝑤𝑐𝜆,𝑤 ( 𝑠 𝑖 )= sup ̃𝑠 ∈ ℝ 𝑛 [ ( 𝑤 ⋅ ̃𝑠 ) − 𝜆 𝑐 ( ̃𝑠, 𝑠 𝑖 ) − 𝜆 ( 𝑤 ⋅ ̃𝑠 ) ] . The goal here is to evaluate inf 𝜆 ≥ ,𝜆 [ 𝜆 𝛿 + 𝜆 𝛼 + 1 𝑁 𝑁 ∑ 𝑖 =1 Ψ 𝑤𝑐𝜆,𝑤 ( 𝑠 𝑖 ) ] (1)in closed form. Using Proposition 3 and Theorem 1 in [1] itfollows that when 𝛿 ‖ 𝑤 ‖ −( 𝛼 − 𝔼 𝑄 𝑁 [ 𝑤 ⋅ 𝑆 ]) ≥ ⟹ sup 𝑄 ∈ 𝑈 𝛿 ( 𝑄 𝑁 ); 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ]= 𝛼 ( 𝑤 ⊤ 𝔼 𝑄 [ 𝑆 𝑆 ⊤ ] 𝑤 ) is feasible, then for 𝑤 ⊤ Σ 𝑤 = ( 𝑤 ⊤ 𝔼 𝑄 [ 𝑆 𝑆 ⊤ ] 𝑤 ) − 𝛼 max 𝛼 ≥ 𝛼 ; 𝛿 ‖ 𝑤 ‖ −( 𝛼 − 𝔼 𝑄𝑁 [ 𝑤 ⋅ 𝑆 ]) ≥ sup 𝑄 ∈ 𝑈 𝛿 ( 𝑄 𝑁 ); 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ]= 𝛼 ( 𝑤 ⊤ Σ 𝑤 ) evaluates to (√ 𝑤 ⊤ Σ 𝑤 + √ 𝛿 ‖ 𝑤 ‖ ) (2)where Σ is evaluated under the reference measure 𝑄 𝑁 andthe optimal 𝛼 ∗ ∶= 𝔼 𝑄 𝑁 [ 𝑤 ⋅ 𝑆 ] ≥ 𝛼 . Singh et al.:
Preprint submitted to Elsevier
Page 3 of 8RPOs
The strong worst case
DRPO condition (P wc ) is now 𝑣 𝑤𝑐𝛼 ∗ ( 𝛿 ) ∶= inf 𝑤 ∈Γ 𝑑𝑠 (√ 𝑤 ⊤ Σ 𝑤 + √ 𝛿 ‖ 𝑤 ‖ ) ≤ 𝑔 𝛼 ∗ ( 𝜃 𝑤𝑐 ) . (D2 wc ) Theorem 2.1. 𝑣 𝑤𝑐𝛼 ∗ ( 𝛿 ) can be computed by solving convexnonlinear program (NLP) N_SRPO wc (listed below). Note this is essentially a second order conic program (SOCP). minimize 𝑤 ∈ ℝ 𝑛 √ 𝑤 ⊤ Σ 𝑤 + √ 𝛿 ‖ 𝑤 ‖ subject to 𝑤 ⋅ 𝑆 ≤ − 𝜖, 𝑁 𝑁 ∑ 𝑖 =1 𝑤 ⋅ 𝑠 𝑖 ≥ ̃𝛼 . (3) Proof.
The formulation is straightforward. The constraintset 𝑤 ∈ Γ 𝑑𝑠 is readily obtained via the constraint 𝑤 ⋅ 𝑆 ≤ − 𝜖 for a suitably small choice of 𝜖 > . The first momentconstraint 𝔼 𝑄 𝑁 [ 𝑤 ⋅ 𝑆 ] ≥ ̃𝛼 is described as above. Thesquaring in the original objective function does not changethe optimal solution. It follows that N_SRPO wc is a convexSOCP, solvable via standard solvers. Theorem 2.2.
For a given 𝜃 𝑤𝑐 , the critical radius 𝛿 𝑤𝑐𝛼 ∗ canbe expressed as inf{ 𝛿 ≥ 𝑣 𝑤𝑐𝛼 ∗ ( 𝛿 ) ≥ 𝑔 𝛼 ∗ ( 𝜃 𝑤𝑐 )} . Further-more, 𝛿 𝑤𝑐𝛼 ∗ can be explicitly computed via binary search. Let 𝛿 𝛼 ∗ < 𝛿 𝑤𝑐𝛼 ∗ . For 𝑄 ∈ 𝑈 𝛿 𝛼 ∗ ( 𝑄 𝑁 ) , it follows that 𝑄 admitsstrong RPOs of at least level 𝜃 𝑤𝑐 . For 𝑄 ∉ 𝑈 𝛿 𝑤𝑐𝛼 ∗ ( 𝑄 𝑁 ) , itfollows that 𝑄 may admit strong RPOs of levels less than 𝜃 𝑤𝑐 .Proof. This characterization of the critical radius 𝛿 𝑤𝑐𝛼 ∗ fol-lows from the condition (D2 wc ) as well as the definition ofDRPOs (see Section 1.3.1). The asymptotic properties of 𝑣 𝑤𝑐𝛼 ∗ are such that 𝑣 𝑤𝑐𝛼 ∗ (0) ≥ and lim 𝛿 → ∞ 𝑣 𝑤𝑐𝛼 ∗ ( 𝛿 ) ≥ 𝑔 𝛼 ∗ ( 𝜃 𝑤𝑐 ) .Furthermore, since 𝑣 𝑤𝑐𝛼 ∗ ( 𝛿 ) is a non-decreasing function of 𝛿 ,it follows that 𝛿 𝑤𝑐𝛼 ∗ can be computed via binary search. Remark 3.
One can view the critical radius 𝛿 𝑤𝑐𝛼 ∗ as a relativemeasure of the degree of strong RPO in the reference mea-sure 𝑄 𝑁 . Those 𝑄 𝑁 which are “close" to admitting RPOsof level less than 𝜃 𝑤𝑐 will have a relatively smaller value of 𝛿 𝑤𝑐𝛼 ∗ . We follow the approach from the previous subsection.To reflect the base case outcome (inside the Wasserstein ballof probability measures of radius 𝛿 ), replace the sup with inf and max with min . The strong best case DRPO condition is inf 𝑤 ∈Γ 𝑑𝑠 min 𝛼 ≥ 𝛼 inf 𝑄 ∈ 𝑈 𝛿 ( 𝑄 𝑁 ); 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ]= 𝛼 ( 𝑤 ⊤ Σ 𝑤 ) ≥ 𝑔 𝛼 ( 𝜃 𝑏𝑐 ) (P bc ) Using Proposition 2 in [1] which invokes problem of mo-ments duality (see Section 1.3.2), the dual formulation forthe inner optimization problem sup 𝑄 ∈ 𝑈 𝛿 ( 𝑄 𝑁 ); 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ]= 𝛼 −( 𝑤 ⊤ 𝔼 𝑄 [ 𝑆 𝑆 ⊤ ] 𝑤 ) (I bc ) where 𝑤 ⊤ Σ 𝑤 = ( 𝑤 ⊤ 𝔼 𝑄 [ 𝑆 𝑆 ⊤ ] 𝑤 ) − 𝛼 is inf 𝜆 ≥ ,𝜆 [ 𝜆 𝛿 + 𝜆 𝛼 + 1 𝑁 𝑁 ∑ 𝑖 =1 Ψ 𝑏𝑐𝜆,𝑤 ( 𝑠 𝑖 ) ] (D bc )where Ψ 𝑏𝑐𝜆,𝑤 is defined, in terms of cost function 𝑐 ( ) , as Ψ 𝑏𝑐𝜆,𝑤 ( 𝑠 𝑖 ) = sup ̃𝑠 ∈ ℝ 𝑛 [ −( 𝑤 ⋅ ̃𝑠 ) − 𝜆 𝑐 ( ̃𝑠, 𝑠 𝑖 ) − 𝜆 ( 𝑤 ⋅ ̃𝑠 ) ] . The goal here is to evaluate − { inf 𝜆 ≥ ,𝜆 [ 𝜆 𝛿 + 𝜆 𝛼 + 1 𝑁 𝑁 ∑ 𝑖 =1 Ψ 𝑏𝑐𝜆,𝑤 ( 𝑠 𝑖 ) ]} in closed form. Proposition 2.1.
Using techniques from Proposition 3 andTheorem 1 in [1] it follows that when 𝛿 ‖ 𝑤 ‖ − ( 𝛼 − 𝔼 𝑄 𝑁 [ 𝑤 ⋅ 𝑆 ]) ≥ ⟹ { sup 𝑄 ∈ 𝑈 𝛿 ( 𝑄 𝑁 ); 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ]= 𝛼 −( 𝑤 ⊤ 𝔼 𝑄 [ 𝑆 𝑆 ⊤ ] 𝑤 ) } is feasible, then for 𝑤 ⊤ Σ 𝑤 = ( 𝑤 ⊤ 𝔼 𝑄 [ 𝑆 𝑆 ⊤ ] 𝑤 ) − 𝛼 min 𝛼 ≥ 𝛼 ; 𝛿 ‖ 𝑤 ‖ −( 𝛼 − 𝔼 𝑄𝑁 [ 𝑤 ⋅ 𝑆 ]) ≥ inf 𝑄 ∈ 𝑈 𝛿 ( 𝑄 𝑁 ); 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ]= 𝛼 ( 𝑤 ⊤ Σ 𝑤 ) (4) evaluates to max (√ 𝑤 ⊤ Σ 𝑤 − √ 𝛿 ‖ 𝑤 ‖ , ) (5) where Σ is evaluated under the reference measure 𝑄 𝑁 andthe optimal 𝛼 ∗ ∶= 𝔼 𝑄 𝑁 [ 𝑤 ⋅ 𝑆 ] ≥ 𝛼 .Proof. The proof consists of a series of steps. First one de-termines that Ψ 𝑏𝑐𝜆,𝑤 is well defined due to the (leading) neg-ative quadratic term for Ψ 𝑏𝑐𝜆,𝑤 . Next one evaluates first orderoptimality conditions for the dual formulation with respectto 𝜆 ≥ and 𝜆 . The feasibility condition 𝛿 ‖ 𝑤 ‖ − ( 𝛼 − 𝔼 𝑄 𝑁 [ 𝑤 ⋅ 𝑆 ]) ≥ arises when evaluating optimality withrepsect to 𝜆 . Then, using back-substitution and simplifyingone arrives at the functional form in (4). Note that portfoliovariance is non-negative (always) hence the zero floor in-duced by the max operator is sensible. See the Appendix forthe detailed proof. Remark 4.
It is interesting to note that the worst case andbest case portfolio variances are symmetric with penalty andbenefit terms √ 𝛿 ‖ 𝑤 ‖ respectively. However, since vari-ance is inherently a non-negative quantity, the best case port-folio variance is floored at zero. Furthermore, zero variancemay lead to a classical arbitrage situation. Indeed, this is thecase if ∃ 𝑤 ∈ Γ 𝑟𝑠 such that 𝑤 ⊤ Σ 𝑤 = 0 ∧ 𝑤 ⋅ ̃𝑆 ≥ where Σ is evaluated under some probability measure 𝑄 ∈ 𝑈 𝛿 ( 𝑄 𝑁 ) [9]. Singh et al.:
Preprint submitted to Elsevier
Page 4 of 8RPOs
The strong best case
DRPO condition (P bc ) is now 𝑣 𝑏𝑐𝛼 ∗ ( 𝛿 ) ∶= inf 𝑤 ∈Γ 𝑑𝑠 max (√ 𝑤 ⊤ Σ 𝑤 − √ 𝛿 ‖ 𝑤 ‖ , ) ≥ 𝑔 𝛼 ∗ ( 𝜃 𝑏𝑐 ) . (D2 bc ) Theorem 2.3. 𝑣 𝑏𝑐𝛼 ∗ ( 𝛿 ) can be computed by solving non-convexnonlinear program (NLP) N_SRPO bc (listed below). minimize 𝑤 ∈ ℝ 𝑛 max (√ 𝑤 ⊤ Σ 𝑤 − √ 𝛿 ‖ 𝑤 ‖ , ) subject to 𝑤 ⋅ 𝑆 ≤ − 𝜖, 𝑁 𝑁 ∑ 𝑖 =1 𝑤 ⋅ 𝑠 𝑖 ≥ ̃𝛼 . (6) Proof.
Again, the formulation is straightforward. The con-straint set 𝑤 ∈ Γ 𝑟𝑠 is readily obtained via the constraint 𝑤 ⋅ 𝑆 ≤ − 𝜖 for a suitably small choice of 𝜖 > . The first mo-ment constraint 𝔼 𝑄 𝑁 [ 𝑤 ⋅ 𝑆 ] ≥ ̃𝛼 is described as above.Note that the mapping 𝑤 → 𝑤 ⊤ Σ 𝑤 is convex but the objec-tive function is non-convex. It follows that N_SRPO bc is anon-convex nonlinear program solvable via standard solvers. Corollary 2.3.1. 𝑤 ∗ ∈ arg min 𝑤 ∈ ℝ 𝑛 √ 𝑤 ⊤ Σ 𝑤 − √ 𝛿 ‖ 𝑤 ‖ ⟹ 𝑤 ∗ ∈ arg min 𝑤 ∈ ℝ 𝑛 max (√ 𝑤 ⊤ Σ 𝑤 − √ 𝛿 ‖ 𝑤 ‖ , ) therefore 𝑣 𝑏𝑐𝛼 ∗ ( 𝛿 ) can be computed by solving non-convex non-linear program (NLP) N_SRPO2 bc (listed below). minimize 𝑤 ∈ ℝ 𝑛 √ 𝑤 ⊤ Σ 𝑤 − √ 𝛿 ‖ 𝑤 ‖ subject to 𝑤 ⋅ 𝑆 ≤ − 𝜖, 𝑁 𝑁 ∑ 𝑖 =1 𝑤 ⋅ 𝑠 𝑖 ≥ ̃𝛼 . (7) Proof.
This follows by observing that max( 𝑔 ( 𝑤 ) , is a mono-tonic (non-decreasing) transformation of 𝑔 ( 𝑤 ) . Proposition 2.2.
Solving N_SRPO2 bc is equivalent to solv-ing up to three one-dimensional search problems min 𝑡> √ 𝑓 ( 𝑡 )− √ 𝛿𝑡 where 𝑓 ( 𝑡 ) is the optimal value of a parameterizedSDP problem.Proof. The proof uses results about semidefinite program-ming (SDP) relaxations of quadratic minimization problems.See the Appendix for details.
Theorem 2.4.
For a given 𝜃 𝑏𝑐 , the critical radius 𝛿 𝑏𝑐𝛼 ∗ can beexpressed as inf{ 𝛿 ≥ 𝑣 𝑏𝑐𝛼 ∗ ( 𝛿 ) ≤ 𝑔 𝛼 ∗ ( 𝜃 𝑏𝑐 )} . Furthermore, 𝛿 𝑏𝑐𝛼 ∗ can be explicitly computed via binary search. Let 𝛿 𝛼 ∗ <𝛿 𝑏𝑐𝛼 ∗ . For 𝑄 ∈ 𝑈 𝛿 𝛼 ∗ ( 𝑄 𝑁 ) , it follows that 𝑄 allows strongRPOs of at most degree 𝜃 𝑏𝑐 . For 𝑄 ∉ 𝑈 𝛿 𝑏𝑐𝛼 ∗ ( 𝑄 𝑁 ) , it followsthat 𝑄 may allow strong RPOs of more than degree 𝜃 𝑏𝑐 . Table 1
Portfolio Restrictions
Restriction MINLP Constraint No Restriction
Short Sales 𝑤 𝑗 ≥ 𝑠𝑠 𝑗 𝑠𝑠 𝑗 = − 𝑀 Min Positions | 𝑤 𝑗 | ≥ 𝑤 𝑤 = 0 Max Positions | 𝑤 𝑗 | ≤ 𝑤 𝑤 = 𝑀 Cardinality ∑ 𝑛𝑗 =1 { | 𝑤 𝑗 | ≥ 𝜖 } ≤ 𝑚 𝑚 = 𝑛 Allocations | ∑ 𝑗 ∈ 𝐴 𝑘 𝑤 𝑗 | ≤ 𝐴 𝑘 𝐴 𝑘 = 𝑀𝑛 Proof.
This characterization of the critical radius 𝛿 𝑏𝑐𝛼 ∗ fol-lows from the condition (D2 bc ) as well as the definition ofDRPOs (see Section 1.3.1). The asymptotic properties of 𝑣 𝑏𝑐𝛼 ∗ are such that 𝑣 𝑏𝑐𝛼 ∗ (0) ≥ and lim 𝛿 → ∞ 𝑣 𝑏𝑐𝛼 ∗ ( 𝛿 ) ≤ 𝑔 𝛼 ∗ ( 𝜃 𝑏𝑐 ) .Furthermore, since 𝑣 𝑏𝑐𝛼 ∗ ( 𝛿 ) is a non-increasing function of 𝛿 ,it follows that 𝛿 𝑏𝑐𝛼 ∗ can be computed via binary search. Remark 5.
One can view the critical radius 𝛿 𝑏𝑐𝛼 ∗ as a relativemeasure of the degree of strong RPO in the reference mea-sure 𝑄 𝑁 . Those 𝑄 𝑁 which are “close" to admitting RPOsof levels more than 𝜃 𝑏𝑐 will have a relatively smaller valueof 𝛿 𝑏𝑐𝛼 ∗ . This subsection discusses refinements to the DRPO con-ditions (see Sections 2.1 and 2.2) to characterize portfoliorestrictions such as short sales restrictions, min and max po-sition constraints, and cardinality constraints [5]. For effi-ciency of presentation, we refer the reader to the N_SRPONLP problems discussed in Sections 2.1 and 2.2 and do notrestate those formulations here. An advantage of the compu-tational machinery developed in this paper is that such port-folio restrictions can be readily incorporated into the exist-ing framework. Table 1 (above) describes the various port-folio restrictions discussed here and associated constraints.Others are possible as well. Note that the index set is 𝑗 ∈{1 , … , 𝑛 } which is suppressed for brevity.
3. Case Study
This case study investigates the DRPOs for a five yearhistorical data set (of month end closing prices) from July2015 to June 2020 for a basket of exchange traded funds(ETFs) spread across different sectors of the economy. The60 month end closing prices define the empirical distribu-tion for random vector 𝑆 and the most recent closing valuesdefine 𝑆 . The best and worst case critical values of 𝜃 arecomputed for a trajectory of Wasserstein radii 𝛿 . The Mat-lab fmincon solver is used, along with multiple search paths,to arrive at a more robust solution. The critical values areshown in the tables and graphs. Note that 𝜃 ∗ = ∞ denotesthe presence of classical arbitrage. For the worst case tra-jectory, shown in Figure 1, we see that it takes a relatively Singh et al.:
Preprint submitted to Elsevier
Page 5 of 8RPOs
Table 2
Basket ConstituentsTicker Name Industry Net Assets (bn)FENY Fidelity MSCI Energy Energy 0.46JETS U.S. Global JETS Travel 0.93VGT Vanguard Tech Technology 33.65VHT Vanguard Health Care Health 12.64XLF Financial SPDR Fund Finance 17.84
Table 3 𝑣 𝑤𝑐𝛼 ∗ ( 𝛿 ) : Worst Case degree 𝜃 ∗ 𝛿 𝜃 ∗ Table 4 𝑣 𝑏𝑐𝛼 ∗ ( 𝛿 ) : Best Case degree 𝜃 ∗ 𝛿 𝜃 ∗ ∞ ∞ ∞ large value of 𝛿 to bring 𝜃 ∗ < . On the other hand, for thebest case trajectory, shown in Figure 2, we see that it takes arelatively small value of 𝛿 to bring 𝜃 ∗ → ∞ . Intuitively thismeans that the empirical distribution 𝑄 𝑁 is close (in termsof Wasserstein distance) to admitting classical arbitrage. Figure 1:
Worst Case degree 𝜃 ∗ , 𝛿 𝜃 ∗ Worst Case
4. Conclusions and Further Work
This work has developed theoretical results and investi-gated calculations of distributionally robust profit opportu-nities using Wasserstein distance as an ambiguity measure.The financial market overview and foundational notation andproblem definitions were introduced in Section 1. Using re-
Figure 2:
Best Case degree 𝜃 ∗ 𝛿 𝜃 ∗ Best Case cent duality results [3], the simpler dual formulation and itsmixture of analytic and computational solutions were de-rived in Section 2. A case study was investigated in Sec-tion 3. Finally, we conclude with some commentary on di-rections for further research. One direction (as previouslymentioned) is to extend the framework to incorporate riskfree securities. Another direction is to consider reward-riskratios other then the Sharpe ratio; a couple such exampleswould be the Sortino ratio and the CVaR ratio.
Data Availability Statement
The raw and/or processed data required to reproduce thefindings from this research can be obtained from the corre-sponding author, [D.S.], upon reasonable request.
Conflict of Interest Statement
The authors declare they have no conflict of interest.
Funding Statement
The authors received no specific funding for this work.
References [1] Blanchet, J., Chen, L., Zhou, X.Y., 2018. Distributionally robustmean-variance portfolio selection with wasserstein distances .[2] Blanchet, J., Kang, Y., Murthy, K., 2019. Robust wasserstein profileinference and applications to machine learning. Journal of AppliedProbability 56, 830–857.[3] Blanchet, J., Murthy, K., 2019. Quantifying distributional model riskvia optimal transport. Mathematics of Operations Research 44, 565–600.[4] Chen, L., He, S., Zhang, S., 2011. When all risk-adjusted performancemeasures are the same: In praise of the sharpe ratio. QuantitativeFinance 11, 1439–1447.[5] Cornuejols, G., Tütüncü, R., 2018. Optimization methods in finance.2 ed., Cambridge University Press.
Singh et al.:
Preprint submitted to Elsevier
Page 6 of 8RPOs [6] Esfahani, P.M., Kuhn, D., 2018. Data-driven distributionally robustoptimization using the wasserstein metric: Performance guaranteesand tractable reformulations. Mathematical Programming 171, 115–166.[7] Gao, R., Kleywegt, A., 2016. Distributionally robust stochastic opti-mization with wasserstein distance. arXiv preprint arXiv:1604.02199.[8] Ostrovski, V., 2013. Stability of no-arbitrage property under modeluncertainty. Statistics & Probability Letters 83, 89–92.[9] PıNar, M.Ç., Tütüncü, R.H., 2005. Robust profit opportunities in riskyfinancial portfolios. Operations Research Letters 33, 331–340.[10] Sharpe, W.F., 1994. The sharpe ratio. Journal of portfolio manage-ment 21, 49–58.[11] Singh, D., Zhang, S., 2020. Robust arbitrage conditions for financialmarkets. arXiv preprint arXiv:2004.09432 .[12] Sturm, J.F., Zhang, S., 2003. On cones of nonnegative quadratic func-tions. Mathematics of Operations Research 28, 246–267.[13] Ye, Y., Zhang, S., 2003. New results on quadratic minimization.SIAM Journal on Optimization 14, 245–267.[14] Zhao, Y., Liu, Y., Zhang, J., Yang, X., 2017. Distributionally robustreward–risk ratio programming with wasserstein metric.
A. Proof of Proposition 2.1
Proposition.
Using techniques from Proposition 3 and Theorem 1in [1] it follows that when 𝛿 ‖ 𝑤 ‖ − ( 𝛼 − 𝔼 𝑄 𝑁 [ 𝑤 ⋅ 𝑆 ]) ≥ ⟹ { sup 𝑄 ∈ 𝑈 𝛿 ( 𝑄 𝑁 ); 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ]= 𝛼 −( 𝑤 ⊤ 𝔼 𝑄 [ 𝑆 𝑆 ⊤ ] 𝑤 ) } is feasible, then for 𝑤 ⊤ Σ 𝑤 = ( 𝑤 ⊤ 𝔼 𝑄 [ 𝑆 𝑆 ⊤ ] 𝑤 ) − 𝛼 inf 𝑄 ∈ 𝑈 𝛿 ( 𝑄 𝑁 ); 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ]= 𝛼 ( 𝑤 ⊤ Σ 𝑤 ) (8) evaluates to max (√ 𝑤 ⊤ Σ 𝑤 − √ 𝛿 ‖ 𝑤 ‖ , ) (9) where Σ is evaluated under the reference measure 𝑄 𝑁 and the op-timal 𝛼 ∗ ∶= 𝔼 𝑄 𝑁 [ 𝑤 ⋅ 𝑆 ] ≥ 𝛼 .Proof. We apply techniques similar to Proposition 3 from [1] andmap our notation to align with that paper for convenience of com-parison. Towards that end we make the following substitutions: { 𝑄, 𝑄 𝑁 , Ψ , 𝑤, 𝑠 𝑖 , 𝑠 } → { 𝑃 , 𝑃 𝑁 , Φ , 𝜙, 𝑅 𝑖 , 𝑢 } respectively, and trans-late notation back for the final result. Using the new notation, thedual for problem I bc now becomes − { inf 𝜆 ≥ ,𝜆 [ 𝜆 𝛿 + 𝜆 𝛼 + 1 𝑁 𝑁 ∑ 𝑖 =1 Φ( 𝑅 𝑖 ) ]} (10)where Φ( 𝑅 𝑖 ) = sup 𝑢 ∈ ℝ 𝑛 [ −( 𝜙 ⋅ 𝑢 ) − 𝜆 𝑐 ( 𝑢, 𝑅 𝑖 ) − 𝜆 ( 𝜙 ⋅ 𝑢 ) ] . (11)Similarly, for 𝜙 ⊤ Σ 𝜙 = ( 𝜙 ⊤ 𝔼 𝑄 [ 𝑅𝑅 ⊤ ] 𝜙 ) − 𝛼 , (5) now becomes inf 𝑃 ∈ 𝑈 𝛿 ( 𝑃 𝑁 ); 𝔼 𝑃 [ 𝜙 ⋅ 𝑅 ]= 𝛼 ( 𝜙 ⊤ Σ 𝜙 ) . (12)Expanding the cost function 𝑐 ( 𝑢, 𝑣 ) = ‖ 𝑢 − 𝑣 ‖ and making thesubstitution Δ = 𝑢 − 𝑅 𝑖 gives Φ( 𝑅 𝑖 ) = sup Δ [ −( 𝜙 ⋅ (Δ + 𝑅 𝑖 )) − 𝜆 ‖ Δ ‖ − 𝜆 ( 𝜙 ⋅ (Δ + 𝑅 𝑖 )) ] = sup Δ [ −( 𝜙 ⋅ Δ) − 2( 𝜙 ⋅ 𝑅 𝑖 )( 𝜙 ⋅ Δ) − 𝜆 ‖ Δ ‖ − 𝜆 ( 𝜙 ⋅ Δ) ]− ( 𝜙 ⋅ 𝑅 𝑖 ) − 𝜆 𝜙 ⋅ 𝑅 𝑖 = sup Δ [ −( ‖ 𝜙 ‖ + 𝜆 ) ‖ Δ ‖ + 2 | 𝜙 ⋅ 𝑅 𝑖 + 𝜆 |‖ 𝜙 ‖ ‖ Δ ‖ ]− ( 𝜙 ⋅ 𝑅 𝑖 ) − 𝜆 𝜙 ⋅ 𝑅 𝑖 = −( 𝜙 ⋅ 𝑅 𝑖 ) − 𝜆 𝜙 ⋅ 𝑅 𝑖 + (2 𝜙 ⋅ 𝑅 𝑖 + 𝜆 ) ‖ 𝜙 ‖ ‖ 𝜙 ‖ + 𝜆 ) . (13)Hence − { inf 𝜆 ≥ ,𝜆 [ 𝜆 𝛿 + 𝜆 𝛼 + 𝑁 ∑ 𝑁𝑖 =1 Φ( 𝑅 𝑖 ) ]} becomes − inf 𝜆 ≥ ,𝜆 𝐻 ∶= 1 𝑁 𝑁 ∑ 𝑖 =1 [ − ( 𝜙 ⋅ 𝑅 𝑖 ) − 𝜆 𝜙 ⋅ 𝑅 𝑖 + (2 𝜙 ⋅ 𝑅 𝑖 + 𝜆 ) ‖ 𝜙 ‖ ‖ 𝜙 ‖ + 𝜆 ) ] + 𝜆 𝛿 + 𝜆 𝛼. (14)The first order optimality condition for 𝜆 gives 𝜕𝐻𝜕𝜆 = 𝛼 + 1 𝑁 𝑁 ∑ 𝑖 =1 [ − ( 𝜙 ⋅ 𝑅 𝑖 ) + 2(2 𝜙 ⋅ 𝑅 𝑖 + 𝜆 ) ‖ 𝜙 ‖ ‖ 𝜙 ‖ + 𝜆 ) ] = 0 . Recall ‖ 𝜙 ‖ > hence we obtain 𝜆 ∗2 = −2 𝛼 − 2 𝐶 𝜆 ‖ 𝜙 ‖ where 𝐶 ∶= 𝛼 − 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 ) . Indeed, 𝜆 ∗2 is optimal since the secondorder condition for 𝜆 gives 𝜕 𝐻𝜕𝜆 = ‖ 𝜙 ‖ ‖ 𝜙 ‖ + 𝜆 ) > . Substituting 𝜆 ∗2 back into 𝐻 gives 𝐻 =1 𝑁 𝑁 ∑ 𝑖 =1 ( 𝜙 ⋅ 𝑅 𝑖 ) − inf 𝜆 ≥ 𝑁 𝑁 ∑ 𝑖 =1 [ ( 𝜙 ⋅ 𝑅 𝑖 − 𝛼 − 𝐶 𝜆 ‖ 𝜙 ‖ ) ( ‖ 𝜙 ‖ + 𝜆 ) ] + 𝜆 𝛿 − 2( 𝛼 + 𝐶 𝜆 ‖ 𝜙 ‖ ) 𝐶. (15)Now let 𝜆 = 𝜅 − ‖ 𝜙 ‖ ≥ ⟹ 𝜅 ≥ ‖ 𝜙 ‖ to get 𝐻 =1 𝑁 𝑁 ∑ 𝑖 =1 ( 𝜙 ⋅ 𝑅 𝑖 ) + 2 𝛼𝐶 − 2 𝐶 + ‖ 𝜙 ‖ 𝛿 − inf 𝜅 ≥ ‖ 𝜙 ‖ 𝑁 𝑁 ∑ 𝑖 =1 [ ( 𝜙 ⋅ 𝑅 𝑖 − 𝛼 − 𝐶𝜅 ‖ 𝜙 ‖ + 𝐶 ) ‖ 𝜙 ‖ 𝜅 ] − 𝜅 ( 2 𝐶 ‖ 𝜙 ‖ − 𝛿 ) . (16)Partial substitution for 𝐶 = 𝛼 − 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 ) and noting 𝑁 ∑ 𝑁𝑖 =1 −2( 𝜙 ⋅ 𝑅 𝑖 − 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 )) 𝐶𝜅 ‖ 𝜙 ‖ = 0 gives 𝐻 = 𝔼 𝑃 𝑁 [( 𝜙 ⋅ 𝑅 ) ] + 2 𝐶 ( 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 )) + 𝛿 ‖ 𝜙 ‖ − { inf 𝜅 ≥ ‖ 𝜙 ‖ 𝑁 𝑁 ∑ 𝑖 =1 [ ( 𝜙 ⋅ 𝑅 𝑖 − 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 )) ‖ 𝜙 ‖ 𝜅 ] + 𝜅 ( 𝛿 − 𝐶 ‖ 𝜙 ‖ ) } . (17)If 𝛿 ‖ 𝜙 ‖ − 𝐶 < the solution is unbounded, which implies sup 𝑄 ∈ 𝑈 𝛿 ( 𝑄 𝑁 ); 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ]= 𝛼 −( 𝑤 ⊤ 𝔼 𝑄 [ 𝑆 𝑆 ⊤ ] 𝑤 ) Singh et al.:
Preprint submitted to Elsevier
Page 7 of 8RPOs is not feasible. Therefore, impose the feasiblity constraint 𝛿 ‖ 𝜙 ‖ − 𝐶 = 𝛿 ‖ 𝜙 ‖ − ( 𝛼 − 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 )) ≥ . To evaluate the inf 𝜅 ≥ ‖ 𝜙 ‖ expression, first make the substitution 𝐴 𝑖 = ( 𝜙 ⋅ 𝑅 𝑖 − 𝜙 ⋅ 𝔼 𝑃𝑁 ( 𝑅 )) ‖ 𝜙 ‖ 𝜅 and 𝐵 = ( 𝛿 − 𝐶 ‖ 𝜙 ‖ ) to get inf 𝜅 ≥ ‖ 𝜙 ‖ 𝑁 ∑ 𝑁𝑖 =1 [ 𝐴 𝑖 𝜅 ] + 𝐵𝜅 . Note this expres-sion is convex hence for the unconstrained problem, the first orderoptimality condition − 𝑁 ∑ 𝑁𝑖 =1 𝐴 𝑖 𝜅 + 𝐵 = 0 suffices to determine 𝜅 ∗ .Some algebra gives 𝜅 ∗ = √ 𝑁 ∑ 𝑁𝑖 =1 𝐴 𝑖 𝐵 ⟹ inf 𝜅 ≥ 𝑁 ∑ 𝑁𝑖 =1 [ 𝐴 𝑖 𝜅 ] + 𝐵𝜅 = 2 √ 𝑁 ∑ 𝑁𝑖 =1 𝐴 𝑖 √ 𝐵 . This can be rewritten as inf 𝜅 ≥ 𝑁 𝑁 ∑ 𝑖 =1 [ 𝐴 𝑖 𝜅 ] + 𝐵𝜅 = 2 √√√√ 𝜙 ⊤ [ 1 𝑁 𝑁 ∑ 𝑖 =1 ( 𝑅 𝑖 − 𝔼 𝑃 𝑁 ( 𝑅 ))( 𝑅 𝑖 − 𝔼 𝑃 𝑁 ( 𝑅 )) ⊤ ] 𝜙 √ 𝛿 ‖ 𝜙 ‖ − 𝐶 = 2 √ 𝜙 ⊤ Σ 𝜙 √ 𝛿 ‖ 𝜙 ‖ − 𝐶 . (18)Note that for the constrained problem, 𝜅 = ‖ 𝜙 ‖ ⟹ (8) evalu-ates to 𝛼 ⟹ (9) evaluates to 0. Thus we see that (9) becomes min 𝛿 ‖ 𝜙 ‖ − 𝐶 ≥ { Γ − 2 √ 𝜙 ⊤ Σ 𝜙 √ 𝛿 ‖ 𝜙 ‖ − 𝐶 − 𝛼 , for 𝑘 ∗ ≥ ‖ 𝜙 ‖ , otherwise } (19)where Γ = 𝔼 𝑃 𝑁 [( 𝜙 ⋅ 𝑅 ) ] + 2 𝐶 ( 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 )) + 𝛿 ‖ 𝜙 ‖ . Let ussubstitute for 𝐶 = 𝛼 − 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 ) and do some work to expand andsimplify the long first term inside the min expression for (14), callit 𝑉 , to get 𝑉 = 𝔼 𝑃 𝑁 [( 𝜙 ⋅ 𝑅 ) ] − ( 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 )) + 𝛿 ‖ 𝜙 ‖ − 𝛼 + 2 𝛼 ( 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 ))− ( 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 )) − 2 √ 𝜙 ⊤ Σ 𝜙 √ 𝛿 ‖ 𝜙 ‖ − ( 𝛼 − 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 )) = 𝜙 ⊤ Σ 𝜙 + [ 𝛿 ‖ 𝜙 ‖ − ( 𝛼 − 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 )) ] − 2 √ 𝜙 ⊤ Σ 𝜙 √ 𝛿 ‖ 𝜙 ‖ − ( 𝛼 − 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 )) = (√ 𝜙 ⊤ Σ 𝜙 − √ 𝛿 ‖ 𝜙 ‖ − ( 𝛼 − 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 )) ) . Now (14) can be written as min 𝛿 ‖ 𝜙 ‖ −( 𝛼 − 𝜙 ⋅ 𝔼 𝑃𝑁 ( 𝑅 )) ≥ ⎧⎪⎨⎪⎩ (√ 𝜙 ⊤ Σ 𝜙 − √ 𝛿 ‖ 𝜙 ‖ − ( 𝛼 − 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 )) ) , for 𝑘 ∗ ≥ ‖ 𝜙 ‖ , otherwise ⎫⎪⎬⎪⎭ . (20)Observing that 𝛼 = 𝜙 ⋅ 𝔼 𝑃 𝑁 ( 𝑅 ) realizes the minimum, and ‖ 𝜙 ‖ ≠ ,it follows that (15) reduces to ⎧⎪⎨⎪⎩ (√ 𝜙 ⊤ Σ 𝜙 − √ 𝛿 ‖ 𝜙 ‖ ) , for 𝑘 ∗ ≥ ‖ 𝜙 ‖ , otherwise ⎫⎪⎬⎪⎭ . (21)Next, we proceed to evaluate the condition 𝜅 ∗ ≥ ‖ 𝜙 ‖ . Recall 𝜅 ∗ = √ 𝑁 ∑ 𝑁𝑖 =1 𝐴 𝑖 𝐵 . For 𝛼 as above, this simplifies to 𝜅 ∗ = √ 𝜙 ⊤ Σ 𝜙𝛿 ‖ 𝜙 ‖ .The condition 𝜅 ∗ ≥ ‖ 𝜙 ‖ now becomes √ 𝜙 ⊤ Σ 𝜙 ≥ √ 𝛿 ‖ 𝜙 ‖ . Therefore (16) simplifies to max (√ 𝜙 ⊤ Σ 𝜙 − √ 𝛿 ‖ 𝜙 ‖ , ) ⟹ inf 𝑄 ∈ 𝑈 𝛿 ( 𝑄 𝑁 ); 𝔼 𝑄 [ 𝑤 ⋅ 𝑆 ]= 𝛼 ( 𝑤 ⊤ Σ 𝑤 ) = max (√ 𝑤 ⊤ Σ 𝑤 − √ 𝛿 ‖ 𝑤 ‖ , ) (22)and we are done. B. Proof of Proposition 2.2
Proposition.
Solving N_SRPO2 bc is equivalent to solving up tothree one-dimensional search problems min 𝑡> √ 𝑓 ( 𝑡 ) − √ 𝛿𝑡 where 𝑓 ( 𝑡 ) is the optimal value of a parameterized SDP problem.Proof. Consider the reformulation of N_SRPO2 bc given by minimize 𝑤 ∈ ℝ 𝑛 (√ 𝑤 ⊤ Σ 𝑤 − √ 𝛿 ‖ 𝑤 ‖ ) subject to 𝑎 ⊤ 𝑤 ≥ ,𝑏 ⊤ 𝑤 ≥ . (23)The KKT optimality condition says that Σ 𝑤 √ 𝑤 ⊤ Σ 𝑤 − √ 𝛿𝑤 √ 𝑤 ⊤ 𝑤 = 𝛽 𝑎 + 𝛽 𝑏 (24)where 𝛽 ≥ and 𝛽 ≥ are the Lagrange multipliers associatedwith the linear constraints. For the purpose of our discussion (com-putational efficiency) let us restrict our attention to the case where 𝛽 > , 𝛽 > ⟹ 𝑎 ⊤ 𝑤 = 1 ∧ 𝑏 ⊤ 𝑤 = 1 . The other cases ofeither 𝑎 ⊤ 𝑤 = 1 or 𝑏 ⊤ 𝑤 = 1 can be treated separately. In this case,the two linear constraints eliminate two variables. Let ̃𝑤 ∈ ℝ 𝑛 −2 denote the remaining 𝑛 − 2 variables. Then write the reformulation min ̃𝑤 ∈ ℝ 𝑛 −2 √ 𝑞 ( ̃𝑤 ) − √ 𝛿𝑞 ( ̃𝑤 ) (25)where 𝑞 and 𝑞 are non-negative convex quadratic functions. Let 𝑓 ( 𝑡 ) denote the optimal value of minimize ̃𝑤 ∈ ℝ 𝑛 −2 𝑞 ( ̃𝑤 )subject to 𝑞 ( ̃𝑤 ) = 𝑡. (26)By the so-called S-lemma (see [12] and [13]), the function 𝑓 ( 𝑡 ) isconvex and can be evaluated by a parameterized SDP in polynomialtime for any given 𝑡 . Now the reformulation reduces to min 𝑡> √ 𝑓 ( 𝑡 ) − √ 𝛿𝑡 (27)and we are done. Singh et al.:
Preprint submitted to Elsevier