[PDF] Conditional Value-at-Risk: Theory and Applications

Abstract

This thesis presents the Conditional Value-at-Risk concept and combines an analysis that covers its application as a risk measure and as a vector norm. For both areas of application the theory is revised in detail and examples are given to show how to apply the concept in practice. In the first part, CVaR as a risk measure is introduced and the analysis covers the mathematical definition of CVaR and different methods to calculate it. Then, CVaR optimization is analysed in the context of portfolio selection and how to apply CVaR optimization for hedging a portfolio consisting of options. The original contributions in this part are an alternative proof of Acerbi's Integral Formula in the continuous case and an explicit programme formulation for portfolio hedging. The second part first analyses the Scaled and Non-Scaled CVaR norm as new family of norms in R n and compares this new norm family to the more widely known L p norms. Then, model (or signal) recovery problems are discussed and it is described how appropriate norms can be used to recover a signal with less observations than the dimension of the signal. The last chapter of this dissertation then shows how the Non-Scaled CVaR norm can be used in this model recovery context. The original contributions in this part are an alternative proof of the equivalence of two different characterizations of the Scaled CVaR norm, a new proposition that the Scaled CVaR norm is piecewise convex, and the entire \autoref{chapter:Recovery_using_CVaR}. Since the CVaR norm is a rather novel concept, its applications in a model recovery context have not been researched yet. Therefore, the final chapter of this thesis might lay the basis for further research in this area.

Full PDF

TThe School of Mathematics

Conditional Value-at-Risk: Theoryand Applications byJakob Kisiala s1301096

Dissertation Presented for the Degree ofMSc in Operational ResearchAugust 2015Supervised byDr Peter Richt´arik a r X i v : . [ q -f i n . R M ] O c t bstract This thesis presents the Conditional Value-at-Risk concept and combines an analysis that coversits application as a risk measure and as a vector norm. For both areas of application the theoryis revised in detail and examples are given to show how to apply the concept in practice.In the ﬁrst part, CVaR as a risk measure is introduced and the analysis covers the mathe-matical deﬁnition of CVaR and diﬀerent methods to calculate it. Then, CVaR optimization isanalysed in the context of portfolio selection and how to apply CVaR optimization for hedginga portfolio consisting of options. The original contributions in this part are an alternative proofof Acerbi’s Integral Formula in the continuous case and an explicit programme formulation forportfolio hedging.The second part ﬁrst analyses the Scaled and Non-Scaled CVaR norm as new family of normsin R n and compares this new norm family to the more widely known L p norms. Then, model (orsignal) recovery problems are discussed and it is described how appropriate norms can be usedto recover a signal with less observations than the dimension of the signal. The last chapter ofthis dissertation then shows how the Non-Scaled CVaR norm can be used in this model recoverycontext. The original contributions in this part are an alternative proof of the equivalence of twodiﬀerent characterizations of the Scaled CVaR norm, a new proposition that the Scaled CVaRnorm is piecewise convex, and the entire Chapter 8. Since the CVaR norm is a rather novelconcept, its applications in a model recovery context have not been researched yet. Therefore,the ﬁnal chapter of this thesis might lay the basis for further research in this area. cknowledgements First of all, I would like to thank my supervisor Peter Richt´arik, whose valuable feedback andideas improved the quality of this thesis considerably. He inspired me to broaden my horizonand study topics which went beyond the syllabus. Furthermore, I would like to thank all theteaching staﬀ who enabled me to learn a lot during my master studies.I would also like to mention my classmates who made this year a memorable experiencebeyond the class room. Especially Wendy, who was always a beam of sunshine in this oftencloudy and rainy city. wn Work Declaration

I declare that this thesis was composed by myself and that the work contained therein is myown, except where explicitly stated otherwise in the text.Edinburgh, 21 August 2015

Place, Date Jakob Kisiala ontents L p Vector Norms 42 C Sα . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.2 Relationship between α and p for C α and L p . . . . . . . . . . . . . . . . . . . . . . 436.3 Behaviour of CVaR Norm C α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Model Recovery Using the CVaR Norm 54 α . . . . . . . . . . . . . . . . . . . . . 558.1.3 Numerically Determining A p − in R . . . . . . . . . . . . . . . . . . . . . . 568.2 Gaussian Width of a Tangent Cone with Respect to the C α Norm . . . . . . . . . 568.3 Numerical Recovery Experiments using the C α Norm . . . . . . . . . . . . . . . . . 578.4 Concluding Remarks on Model Recovery Using the CVaR Norm . . . . . . . . . . 60

A.1 List of Matlab Code Developed During this Dissertation . . . . . . . . . . . . . . . IA.2 Scaled CVaR Calculation based on Deﬁnition 5.1 . . . . . . . . . . . . . . . . . . . IIA.3 Scaled CVaR Calculation based on Proposition 5.1 . . . . . . . . . . . . . . . . . . IIIA.4 CVaR Calculation based on Deﬁnition 5.2 . . . . . . . . . . . . . . . . . . . . . . . . IVA.5 CVaR Calculation based on Proposition 5.2 . . . . . . . . . . . . . . . . . . . . . . . V

B Extended Tables VII

B.1 Option Prices on NASDAQ:YHOO . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIIB.2 Option Prices on NASDAQ:GOOGL . . . . . . . . . . . . . . . . . . . . . . . . . . . VIIIB.3 Trader’s positions before hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IXB.4 Trader’s positions in Yahoo Options after hedging . . . . . . . . . . . . . . . . . . . XB.5 Trader’s positions in Google Options after hedging . . . . . . . . . . . . . . . . . . XIB.6 Computation times of Scaled and (non-scaled) CVaR Norm in ms . . . . . . . . . XIIB.7 Ratio of Projections of Random Hyperplanes onto C α Unit Ball in R over 5,000Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XIII C Extended Diagrams XIV

C.1 Monte Carlo simulated loss distributions of single assets . . . . . . . . . . . . . . . XIVC.2 Monte Carlo simulated loss distributions of optimal portfolios . . . . . . . . . . . . XVC.3 C α and L p ∗ norm surface plots of x ∈ R n for diﬀerent α and p ∗ . . . . . . . . . . . XVIC.4 Projection of a circle onto the unit ball in R using L and C α norms . . . . . . . XVIII ist of Figures α and CVaR α of a random variable X representing loss. . . . . . . . . . . . . . 73.1 Eﬃcient frontier for a sample portfolio. . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 Function value φ . ( c ) of Y for diﬀerent values of c . . . . . . . . . . . . . . . . . . 174.1 Reproduced from [21, p. 198], payoﬀ and proﬁt proﬁle for a call option. . . . . . . 224.2 Reproduced from [21, p. 198], payoﬀ and proﬁt proﬁle for a put option. . . . . . . 224.3 Reproduced from [21, p. 249], payoﬀ and proﬁt proﬁle for the sale of a strangle. . 234.4 Proﬁt proﬁles for (unhedged) Google and Yahoo strangles at maturity. . . . . . . 254.5 Histogram of trader’s (unhedged) portfolio losses from 20,000 simulations. . . . . 264.6 Proﬁt proﬁles for hedged Google and Yahoo strangles at maturity. . . . . . . . . . 284.7 Histogram of trader’s hedged portfolio losses from 20,000 simulations. . . . . . . . 295.1 Unit balls of ⟪ x ⟫ Sα for x ∈ R and diﬀerent values of α . . . . . . . . . . . . . . . . . 325.2 Unit balls of ⟪ x ⟫ α for x ∈ R and diﬀerent values of α . . . . . . . . . . . . . . . . . 365.3 Scaled CVaR norm C Sα against α for diﬀerent x . . . . . . . . . . . . . . . . . . . . . 396.1 Reproduced from [25, p. 6], C Sα and L Sp Norms of x for diﬀerent values of α and p ( α ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.2 [25, p. 5] Norm unit disks of C Sα and L Sp for diﬀerent values of α and p ( α ) . . . . . 436.3 Reproduced from [17, p. 11], f n,p ( κ ∗ ) for diﬀerent values of n and p , with κ ∗ = n p . 456.4 Reproduced from [17, p. 10], C α and L p Norms of x for diﬀerent values of α and p ( α ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.5 [25, p. 17] Norm unit disks of C α and L p for diﬀerent values of α and p ( α ) . . . . 466.6 Norm surface plots ( C α and L p ) of x for p = α ∗ = −√ . . . . . . . . . . . . . 476.7 Projection of a circle onto the unit ball using diﬀerent norms. . . . . . . . . . . . . 477.1 Atoms, their convex hull, and relation to the L and C α norms in R . . . . . . . . 497.2 [1, p. 35] Examples of cones K and polar cones K ∗ . . . . . . . . . . . . . . . . . . . 507.3 [1, p. 49] Examples of tangent and normal cones with respect to a set C . . . . . . 518.1 [17, p. 13] Unit balls of C α in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568.2 Probability of exact recovery for a vector x ∈ R using the CVaR norm as theatomic norm with n measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588.3 Probability of exact recovery for a k-sparse vector x ∈ R using either the L norm or C α norm as the atomic norm with n measurements. . . . . . . . . . . . . 598.4 Probability of exact recovery for a vector x ∈ R using either the L ∞ norm or C α norm as the atomic norm with n measurements. . . . . . . . . . . . . . . . . . . 59 ist of Tables A and B under three scenarios. . . . . . . . . . . . . . . . . 82.2 Discrete loss distribution of a random variable Y . . . . . . . . . . . . . . . . . . . . 93.1 Mean Asset Losses of S&P, Government Bonds, and Small Cap. . . . . . . . . . . 183.2 Covariance Matrix of S&P, Government Bonds, and Small Cap. . . . . . . . . . . . 183.3 Minimum Variance and Minimum CVaR portfolios for diﬀerent required returns. 183.4 Characterization of loss distributions used in second scenario. . . . . . . . . . . . . 193.5 Minimum Variance and Minimum CVaR portfolios for scenario 2. . . . . . . . . . 203.6 Performance and risk indicators of optimal portfolios for scenario 2. . . . . . . . . 204.1 Variables used in LP to calculate CVaR optimal hedge. . . . . . . . . . . . . . . . . 274.2 Risk metrics for the original and hedged option portfolio. . . . . . . . . . . . . . . 295.1 Computations times of Scaled and Non-Scaled CVaR norms for diﬀerent n . . . . . 415.2 Computations times of Scaled and Non-Scaled CVaR norms for diﬀerent α . . . . . 41 hapter 1 Introduction

This chapter presents the motivation for this thesis, gives the outline of the following chapters,and states the original contributions of the thesis.Note that are no dedicated chapters covering a literature review or to establish notation.Rather, the literature is reviewed and notation is established in each chapter and section whereit is appropriate.

In ﬁnancial risk management, especially with practitioners, Value-at-Risk (VaR) is a widely usedrisk measure because its concept is easily understandable and it focusses on the down-side, i.e.tail risk. A possible deﬁnition is given by Choudhry: “VaR is a measure of market risk. It isthe maximum loss which can occur with [ ( α × ) ] % conﬁdence [...]” [13, p. 30].However, despite its wide use, VaR is not a coherent risk measure. The concept of a coherentrisk measure was introduced by Artzner et al. in [4]. They formulated that a risk measure ρ iscoherent if it satisﬁes the following axioms (see Section 2.2 for details): • Monotonicity • Translation equivariance • Subadditivity • Positive HomogeneityVaR is only coherent when the underlying loss distribution is normal, otherwise it lacks sub-additivity. Other disadvantages of the VaR measure are that it does not give any informationabout potential losses in the 1 − α worst cases and that calculating VaR optimal portfolios canbe diﬃcult, if not impossible [30, p. 1444].The Conditional Value-at-Risk (CVaR) is closely linked to VaR, but provides several distinctadvantages. In fact, in settings where the loss is normally distributed, CVaR, VaR, and MinimumVariance (Markowitz) optimization give the same optimal portfolios [29, p. 29]. The advantagesof CVaR become apparent when the loss distribution is not normal or when the optimizationproblem is high-dimensional: CVaR is a coherent risk measure for any type of loss distribution.Furthermore, in settings where an investor wants to form a portfolio of diﬀerent assets, theportfolio CVaR can be optimized by a computationally eﬃcient, linear minimization problem,which simultaneously gives the VaR at the same conﬁdence level as a by-product. On the otherhand, it is diﬃcult to form VaR optimal portfolios, as is these settings VaR is diﬃcult to calcu-late. This computationally eﬃcient way to optimize the portfolio CVaR can also be transferredto hedging problems, in which an investment decision has been taken, but adjustments are pos-sible so that the downside risk of the investment can be reduced. For example, [3], [5], [31], and[34] used CVaR optimization to hedge risk, each one in a diﬀerent setting.What is more remarkable, is that the CVaR concept (which was developed as a ﬁnancial riskmeasure) can be abstracted to form a new family of norms in R n . The Scaled and (Non-Scaled)1VaR norm can then be used as alternatives to the widely established family of L p norms.Moreover, by choosing suitable α , the CVaR norm is equivalent to the L and L ∞ norm.Having this new CVaR norm also opens up new opportunities in Big Data optimization,particularly in model or signal recovery problems. In these problems, it is the goal to reconstructa model or signal of dimension p when less than p observations are available. This can be achievedby exploiting the structure of particular signals and solving a norm minimization problem usingan appropriate norm. Particularly the L and L ∞ norm are used for two diﬀerent types ofmodels, and having the CVaR norm as another norm in R n could recover further types ofsignals and models. To the best knowledge of the author, no research has been undertaken sofar to use the CVaR norm in model recovery problems, so this might be another area of researchto consider in the future. This thesis consists of 7 main chapters (not counting the introduction and conclusion), whichconcentrate on two main areas: First, the use of CVaR as a risk measure and second, the char-acteristics of the CVaR norm with an outlook on possible future applications. For both areas,an extensive analysis on the theory of CVaR and the CVaR norm is given, before showing howthis theory can be applied in practice.Chapter 2 introduces the concept of CVaR as a risk measure for a univariate loss distribution.It starts by showing how VaR and CVaR are related to each other. Then, the notion of a coherentrisk measure is introduced and it is shown why VaR is not coherent. Section 2.3 then examinesthe mathematical deﬁnition of CVaR and shows how the CVaR can be calculated using theConvex Combination Formula. The chapter ﬁnishes by showing an alternative way to calculateCVaR, namely using Acerbi’s Integral Formula.Chapter 3 moves from univariate to multivariate loss distributions. These loss distributionsarise in portfolio optimization problems, where there are diﬀerent assets, each with their own lossdistribution and the investor’s loss depends on his investment decision into each asset. Section 3.1discusses the ﬁrst model that was introduced to optimize a portfolio with regards to risk (theMarkowitz Model, which aims to reduce the portfolio variance). Identifying the shortcomingsof the Markowitz Model gives the motivation for the next model that is considered, i.e. theRockafellar and Uryasev Model, which optimizes the portfolio CVaR. The analysis extends theresults of the CVaR analysis in the univariate case to the multivariate case and gives a linearoptimization programme that minimizes the CVaR of a portfolio. This section also shows thatthe Markowitz Model and Rockafellar and Uryasev Model lead to the same optimal portfolio ifthe loss of all assets in the portfolio is normally distributed. Section 3.3 then gives two numericalexamples to demonstrate the results that were established in this chapter. First, it is shownthat in certain cases CVaR and Mean-Variance optimization indeed give the same portfolio,before demonstrating that for non-normal loss distributions CVaR optimization gives a lessrisky portfolio that Mean-Variance optimization.Next, Chapter 4 shows how the CVaR optimization problem can be used to hedge tail lossesfrom a previous investment decision. In this particular example, a scenario based on real worlddata is created. Simplifying assumptions are made to focus on the hedging procedure insteadof the technical implementation of the hedge. For the scenario, a trader’s portfolio is to be ad-justed, so that the CVaR of the portfolio is minimized. Since it is an option portfolio (for whichthe risk manager needs a daily estimate on the portfolio variance) Section 4.1 and Section 4.2give the necessary ﬁnance and risk management background. Section 4.3 brieﬂy describes howthe portfolio is formed before Section 4.4 explains the hedging procedure, including an explicitformulation of the hedging problem. The portfolio risk before and after hedging are comparedand it is shown how the hedging procedure can improve the risk proﬁle of the portfolio.Moving away from the ﬁnancial context, Chapter 5 introduces two norms that are based on2VaR: the Scaled CVaR norm C Sα , and the (Non-Scaled) CVaR norm C α . For both norms, twodiﬀerent yet equivalent characterizations are given. Section 5.3 then describes the propertiesof each norm and especially shows how their properties with regards to the parameter α arefundamentally diﬀerent. Since these norms are fairly novel and standard algorithms to calculatethem are not yet implemented in MATLAB, Section 5.4 examines the computational eﬃciencyof calculating the two norms, C Sα and C α , using the two diﬀerent characterizations for each.To give a better understanding of C Sα and C α , they are both compared to the more familiarfamily of L p norms in Chapter 6. First C Sα is compared to L Sp norms before the C α is analysedwith regards to the parameter α and its proximity to L p norms.Chapter 7 then gives a possible application of the CVaR norm in an optimization context:model recovery using atomic norms. In model (or signal) recovery the goal is to reconstruct a p -dimensional model (or signal) with n random measurements, such that n < p . For a recovery tobe successful, the model must have a certain structure that can be exploited by a corresponding atomic norm . Section 7.1 provides the background on atomic norms and convex geometry (e.g.the notions of tangent and normal cones) that is needed to explore the usefulness of the CVaRnorm in this setting. Section 7.2 states the necessary recovery conditions, more precisely thenumber of random measurements needed to ensure that a p -dimensional model can be recoveredfrom n measurements. The number of measurements n is derived by using Gaussian Widths ,which are quite diﬃcult to compute directly. Therefore, Section 7.3 states some properties ofGaussian Widths that might prove useful when establishing a bound on n .The ﬁnal chapter, Chapter 8, is completely original in the sense that it explores how theCVaR norm can be used in the context of model recovery problems. To the best knowledge ofthe author, no research in this particular area has been carried out before. Unfortunately, due tothe limited scope of this thesis, the analysis could not be completed. Rather, this chapter shouldshow areas of further research, with pointers towards what could be analysed in more detail.Section 8.1 contains a conjecture about the set of atoms of the CVaR norm for a certain α . Aproposition based on the conjecture is proven, but due to the limited scope of this dissertation,the conjecture could not be proven in full. Still, a numerical experiment was carried out toidentify the atoms of the CVaR norm in R and this experiment provides further evidence thatthe conjecture is true. Section 8.2 is rather short, showing how a bound on the number ofmeasurements n can be derived if expressions are available for the tangent or normal cone withrespect to the atoms of CVaR norm. Some numerical experiments were performed to recoversimple signals using the CVaR norm in Section 8.3. The results are not impressive, as theexperiments were limited to a certain α and only few special cases of signals. Analysing modelrecovery using the CVaR norm further could lead to diﬀerent set ups, for which the results couldbe better. First of all, to the best knowledge of the author, this thesis is the ﬁrst piece of work thatanalyses CVaR as a risk measure and the CVaR norm (including possible applications) in auniﬁed way. There is an abundance of papers on CVaR, CVaR portfolio optimization, andfurther applications of CVaR as a risk measure. However, there is little research on the CVaRnorm and no research on the application of the CVaR norm in the context of model recovery.A large part of this thesis presents results of other papers. Even with established concepts,the author aims to present them in such a way that the concepts are easily understandable.Also, most plots in this paper were reproduced independently to conﬁrm the results of otherauthors. But throughout the paper several original contributions are made, either by presentingnew proofs to existing propositions, or by stating new propositions / conjectures. In detail, theoriginal contributions are: • Subsection 2.4.1: A new proof of Acerbi’s Integral Formula (ﬁrst proposed in [2]) tocalculate CVaR is given. • Section 3.1: Although this is a standard result, the author proves independently why3ortfolio diversiﬁcation reduces risk (when measured by standard deviation). The reasonto give an independent proof is that the standard introductory ﬁnancial literature onlyshows this result for N = N ≥ • Section 4.4: Although hedging using CVaR optimization was discussed by Rockafellar andUryasev in [29], they never explicitly formulated the optimization programme. This thesisclearly deﬁnes the variables and states the problem for a CVaR optimal hedge of a portfolioof options. • Subsection 5.1.2: This subsection introduces a second, equivalent characterization of theScaled CVaR norm, which was proposed by Pavlikov and Uryasev in [25]. The originalcontribution of this thesis is an alternative proof of the equivalence of the two diﬀerentcharacterizations. • Proposition 5.5: The piecewise convexity of the Scaled CVaR norm is a new and originalproposition of this thesis, to the best knowledge of the author. • Section 5.4: To the best knowledge of the author, the computational eﬃciency of diﬀerentalgorithms to calculate the Scaled and Non-Scaled CVaR norm has not been investigatedbefore. • Section 8.1: To the best knowledge of the author, the atoms (i.e. the extreme points ofthe unit ball) of the CVaR norm have never been explicitly stated before. This sectionconjectures the set of atoms of the CVaR norm for a speciﬁc α . It shows that for diﬀerent α the unit ball of the CVaR norm looks diﬀerent, and ﬁnally a numerical experiment isperformed to provide evidence for the conjecture in R . • Section 8.3: To the best knowledge of the author, the CVaR norm has never been analysedin the context of model recovery problems. This section performs some numerical recoveryexperiments to see how suitable C α would be recover a special type of signal. Because ofthe close link between the CVaR norm and the L and L ∞ norms, it is also investigatedhow well the CVaR norm performs in signal recovery problems when compared to thesetwo L p norms. 4 hapter 2 Conditional Value-at-Risk as a RiskMeasure

This chapter introduces the concept of CVaR (building on the VaR concept) in the way that itwas ﬁrst introduced - a ﬁnancial risk measure. In Section 2.1 the mathematical deﬁnitions of VaRand CVaR are given, followed by an intuitive description of their properties and interactions.Section 2.2 presents the axioms that must be satisﬁed for a risk measure to be considered coherent . Speciﬁcally, an example is shown to prove that VaR is not subadditive - whereas forthe same example, CVaR is subadditive. Finally, Section 2.3 explores the CVaR concept inmore detail, giving diﬀerent algorithms and optimization programmes to calculate the CVaR ofa given loss distribution in a variety of settings. Section 2.4 states Acerbi’s Integral Formula tocalculate CVaR and gives an alternative proof of the formula.

Since losses are random variables, some statistical measures need to be introduced to coverthe basics for latter sections and chapters, especially the ones concerning portfolio optimization(Chapter 3 and Chapter 4).

Deﬁnition 2.1 ([22, p. 17] Expectation) . The expectation , sometimes called expected value or mean , of a random variable X is deﬁned as E [ X ] ∶= ∞ ∫ −∞ xf ( x ) dx in the continous case (2.1) or E [ X ] ∶= ∞ ∑ k =−∞ kP ( X = k ) in the discrete case, (2.2) where f ( x ) is the probability density function of X and P ( X = k ) is the probability mass function X . The expectation is often denoted by the letter µ , such that µ = E [ X ] . E [ X ] providesinformation about the distribution of X ; informally it can be described as the centre valuearound which possible values of X disperse [22, p. 17]. Deﬁnition 2.2 ([22, p. 18] Variance) . The variance of a random variable X is deﬁned as Var ( X ) ∶= E [( X − E [ X ]) ] . (2.3) Many texts apply the distinction to use µ for the population mean and ˆ µ for the sample mean. Althoughthe expectation of the loss variable X is actually a sample mean, this dissertation will use the notation µ whentalking about the expectation of losses. σ . Since the variance is hard to interpret as it is given insquare units, the standard deviation (denoted σ = √ Var ( X ) ) is often used. It does not containadditional information, but is easier to interpret as σ is given in the same units as µ [22, p. 18].The standard deviation σ (or variance σ ) measures how strongly X is dispersed around µ .Small values of σ indicate that X is concentrated strongly around µ , while large values of σ mean that values of X further away from µ (in either direction) are more likely.Another important concept throughout this dissertation is Covariance . Deﬁnition 2.3 ([22, p. 21] Covariance) . The covariance of two random variables X and X is deﬁned as Cov ( X , X ) ∶= E [( X − E [ X ]) ( X − E [ X ])] . (2.4)Covariance measures how strongly the variable X varies together with X (and vice versa).As a special case, Cov ( X, X ) =

Var ( X ) . Also, if X and X are independent, their covarianceis 0 [22, p. 21]. As in the case with variance, the covariance is hard to interpret, as its unit isthe product of the respective units of X and X . Therefore, another measure for dependencythat is derived from the covariance and variance is commonly used to express how strongly X and X vary together - it is called the correlation coeﬃcient : Deﬁnition 2.4 ([22, p. 22] Correlation Coeﬃcient) . The correlation coeﬃcient of two randomvariables X and X is deﬁned as ρ ∶= Cov ( X , X )√ Var ( X )√ Var ( X ) . (2.5) ρ always takes values between -1 and 1 and is therefore easier to interpret than covariance.If ∣ ρ ∣ is close to 1, then there is a strong dependence between X and X [22, p. 22].As pointed out in the introduction, Value-at-Risk (VaR) is the maximum loss that will notbe exceeded at a given conﬁdence level. This gives the following mathematical deﬁnition of VaR: Deﬁnition 2.5 ([27, week 8, p. 5] Value-at-Risk (VaR)) . Let X be a random variable repre-senting loss. Given a parameter < α < , the α -VaR of X is VaR α ( X ) ∶= min { c ∶ P ( X ≤ c ) ≥ α } . (2.6)Given Deﬁnition 2.5, VaR can have several equivalent interpretations [27, week 8, p. 5]: • VaR α ( X ) is the minimum loss that will not be exceeded with probability α . • VaR α ( X ) is the α -quantile of the distribution of X . • VaR α ( X ) is the smallest loss in the ( − α ) × • VaR α ( X ) is the highest loss in the α × X , the Conditional Value-at-Risk is the expected loss, conditional on thefact that the loss exceeds the VaR at the given conﬁdence level: Deﬁnition 2.6 ([27, week 8, p. 13] Conditional Value-at-Risk (CVaR) in the continuous case) . Let X be a continuous random variable representing loss. Given a parameter < α < , the α -CVaR of X is CVaR α ( X ) ∶= E [ X ∣ X ≥ VaR α ( X )] . (2.7) Again, many texts apply a distinction between the population variance σ and the sample variance s . As inthe case with the expectation, this dissertation will use the notation σ when talking about the variance of losses. Average Value-at-Risk , ExpectedShortfall , or

Tail Conditional Expectation , although some authors make subtle distinctions be-tween their deﬁnitions [27, week 8, p. 13].Figure 2.1 shows the VaR and CVaR for a speciﬁc continuous random variable X . Thecumulative distribution function of X can be used to ﬁnd VaR α ( X ) , and VaR α ( X ) can be usedin turn to calculate CVaR α ( X ) . Figure 2.1: VaR α and CVaR α of a random variable X representing loss. Artzner et al. analysed risk measures in [4] and stated a set of properties / axioms that shouldbe desirable for any risk measure. Any risk measure which satisﬁes these axioms is said to be coherent . The four axioms they stated are

Monotonicity , Translation equivariance , Subadditivity ,and

Positive Homogeneity . For the deﬁnitions of all axioms, X and Y are random variablesrepresenting loss, c ∈ R is a scalar representing loss, and ρ is a risk function, i.e. it maps therandom variable X (or Y ) to R , according to the risk associated with X (or Y ). Deﬁnition 2.7 ([4, p. 210] Monotonicity) . A risk measure ρ is monotone , if for all X , Y : X ≤ Y ⇒ ρ ( X ) ≤ ρ ( Y ) . (2.8) Deﬁnition 2.8 ([4, p. 209] Translation Equivariance) . A risk measure ρ is translation equiv-ariant , if for all X , c : ρ ( X + c ) = ρ ( X ) + c. (2.9) Deﬁnition 2.9 ([4, p. 209] Subadditivity) . A risk measure ρ is subadditive , if for all X , Y : ρ ( X + Y ) ≤ ρ ( X ) + ρ ( Y ) . (2.10) Deﬁnition 2.10 ([4, p. 209] Positive Homogeneity) . A risk measure ρ is positively homoge-neous , if for all X , λ ≥ : ρ ( λX ) = λρ ( X ) . (2.11)Speaking in a more intuitive way, the above axioms (Deﬁnition 2.7 - Deﬁnition 2.10) can beinterpreted as follows [27, week 8, p. 10 f.]: An alternative approach to ﬁnd VaR and CVaR is shown in Theorem 3.2 Monotonicity : Higher losses mean higher risk. • Translation Equivariance : Increasing (or decreasing) the loss increases (decreases) therisk by the same amount. • Subadditivity : Diversiﬁcation decreases risk. • Monotonicity : Doubling the portfolio size doubles the risk.VaR fails to meet the subadditivity axiom (Deﬁnition 2.9) and is therefore criticized for notbeing a coherent risk measure. A simple example shows this [27, week 8, p. 19]:Consider two possible investments, A and B , which have the loss proﬁle shown in Table 2.1.There are three diﬀerent scenarios ξ , ξ , ξ , each with associated probability p ( ξ i ) . ξ ξ ξ p ( ξ i ) .

04 0 .

04 0 . A B A and B under three scenarios.Using Equation 2.6 to calculate the VaR at the 95 % conﬁdence level for investments in A , B , and A + B givesVaR . ( A ) = min { c ∶ P ( A ≤ c ) ≥ . } = ( P ( A ≤ ) = . ) , VaR . ( B ) = min { c ∶ P ( B ≤ c ) ≥ . } = ( P ( B ≤ ) = . ) , andVaR . ( A + B ) = min { c ∶ P ( A + B ≤ c ) ≥ . } = . In this example, VaR . ( A + B ) /≤ VaR . ( A ) + VaR . ( B ) , hence VaR is not subadditiveaccording to Deﬁnition 2.9. Therefore, it is not a coherent risk measure in the sense of Artzneret al.Acerbi and Tasche proved in [2] that CVaR in satisﬁes the above axioms and is therefore a co-herent risk measure. Using the previous example together with Equation 2.15 of Proposition 2.1gives CVaR . ( A ) = ( λ = . , CVaR + . ( A ) = ) , CVaR . ( B ) = ( λ = . , CVaR + . ( B ) = ) , andCVaR . ( A + B ) = ( λ = , CVaR + . ( A + B ) = ) . which shows that subadditivity holds for CVaR, as CVaR . ( A + B ) = ≤ CVaR . ( A ) + CVaR . ( B ) = Analysing CVaR in a wider context, one can derive CVaR from the generalized α -tail distribu-tion of a random variable X (which represents loss). This is what Rockafellar and Uryasev didin [30]. While [30] focused on general distributions, their previous work in [29] concerned theCVaR of continuous loss distributions. This section will present the results of both papers in auniﬁed way, for discrete as well as for continuous loss distributions.Suppose that X is the loss distribution, and that F X ( z ) is the cumulative distributionfunction of X , i.e. F X ( z ) = P ( X ≤ z ) . Then the generalized α -tail distribution of is deﬁned as To be precise: In [2] Acerbi and Tasche deﬁned

Expected Shortfall (ES) and

CVaR slightly diﬀerently. In thepaper, they ﬁrst proved that ES is a coherent risk measure and later proved that ES is identical to CVaR. F αX ( z ) ∶= { , when z < VaR α ( X ) F X ( z )− α − α , when z ≥ VaR α ( X ) . (2.12)Now, if X α is the random variable whose cumulative distribution function is F αX (Equation 2.12),then the CVaR is deﬁned as CVaR α ( X ) ∶= E [ X α ] , (2.13)which leads to Deﬁnition 2.6 in the continuous case (CVaR α ( X ) = E [ X ∣ X ≥ VaR α ( X )] ), butis diﬀerent for the discrete case [27, week 8, p. 15].For discrete or non-continuous loss distributions, Rockafellar and Uryasev proposed to cal-culate CVaR as a weighted average, also called the Convex Combination Formula . To apply theConvex Combination Formula, one needs the VaR α and CVaR + α of X , where CVaR + α ( X ) is theexpected loss strictly greater than the VaR α ( X ) , i.e.,CVaR + α ( X ) ∶= E [ X ∣ X > VaR α ( X )] . (2.14) Proposition 2.1 ([30, p. 1452] CVaR as a weighted average / Convex Combination Formula) . Let Ψ be cumulative probability of VaR α ( X ) , i.e. Ψ = F X ( VaR α ( X )) and deﬁne λ as λ ∶= Ψ − α − α , for ≤ α < . We then have: CVaR α ( X ) = λ VaR α ( X ) + ( − λ ) CVaR + α ( X ) . (2.15)Note that Proposition 2.1 is valid for all loss distributions, including continuous ones. FromProposition 2.1 it follows that CVaR α dominates VaR α , i.e. CVaR α ≥ VaR α . In fact, CVaR α > VaR α , unless VaR α is the maximum loss possible [30, p. 1452]. Another result to emphasize isthat the representation of CVaR by Equation 2.15 is rather surprising. As shown earlier, VaRis not a coherent risk measure (see Section 2.2) and, in fact, neither is CVaR + [27, week 8, p.16]. However, both these incoherent risk measures are combined in the Convex CombinationFormula to yield CVaR, which is coherent and therefore has many advantageous properties [30,p. 1452].To provide a better understanding of the Convex Combination Formula (Equation 2.15),an example of a discrete loss distribution will be presented. The losses y i with associatedprobabilities are given in Table 2.2.i 1 2 3 4 5 6 y i

100 200 400 800 900 1000 P ( Y = y i ) Y .Now assume the 95 % CVaR is to be determined. Since F Y ( ) = P ( Y ≤ ) = . F Y ( ) = P ( Y ≤ ) = .

98, it follows that VaR . ( Y ) = min { c ∶ P ( Y ≤ c ) ≥ . } = λ = . − . − . = . Also, CVaR + . ( Y ) can be calculated as × + × = . ( Y ) = × + × = . .4 Acerbi’s Integral Formula Another way to express CVaR is to use Acerbi’s integral formula.

Proposition 2.2 ([12, p. 329] Acerbi’s Integral Formula for CVaR) . The CVaR of a randomvariable X , which represents loss, at the conﬁdence level α can be expressed as CVaR α ( X ) = − α ∫ α VaR β ( X ) dβ. (2.16)Hence, CVaR α can also be interpreted as the average VaR β for β ∈ [ α, ] [27, week 8, p.33]. To demonstrate how Equation 2.16 is applied, an example with a uniform loss distributionwill be given. For this example, assume that the loss is distributed continuously and uniformlybetween 0 and 100, i.e., X ∼ U ( , ) . Thus, f X ( z ) = for 0 ≤ z ≤

100 and 0 elsewhere. TheVaR at conﬁdence level β is given as VaR β ( X ) = × β . Then the CVaR at conﬁdence level α can be calculated asCVaR α ( X ) = − α ∫ α VaR β ( X ) dβ = − α ∫ α × β dβ = − α [ β ] α = × ( + α ) . So in this example, the 90 % CVaR would be CVaR . ( X ) = × ( + . ) = Although Acerbi and Tasche proved Proposition 2.2 in [2, p. 1492], another proof will be givenhere. Two reasons for this alternative proof are, ﬁrst, that Acerbi used diﬀerent deﬁnitions in hispaper, and second, to show how the result can be derived in another way. To the best knowledgeof the author, this alternative proof has not been published before. However, the proof givenhere only holds for continuous random variables and therefore lacks the generality of Acerbi’sproof.For this alternative proof, the probability density function of the generalized α -tail distribu-tion is needed, which can be derived from Equation 2.12 as f αX ( z ) = ddz F αX ( z ) , i.e., f αX ( z ) = { , when z < VaR α ( X ) f X ( z ) − α , when z ≥ VaR α ( X ) . (2.17) Proof. (Continuous case only) Starting from the very basic deﬁnition of CVaR given in Equa-tion 2.13, one can use integration by substitution to arrive at Equation 2.16:CVaR α ( X ) = E [ X α ]= ∞ ∫ −∞ zf αX ( z ) dz = VaR α ( X ) ∫ −∞ zf αX ( z ) dz + ∞ ∫ VaR α ( X ) zf αX ( z ) dz. f αX ( z ) given in Equation 2.17, the above equality simpliﬁes toCVaR α ( X ) = ∞ ∫ VaR α ( X ) z f X ( z ) − α dz. Now, one can deﬁne a new variable β , such that β = F X ( z ) . Diﬀerentiating β with respect to z gives ddz β = f X ( z ) ⇐⇒ f X ( z ) dz = dβ. Furthermore, since X is continuous, there is a one-to-one relationship between β and z and byEquation 2.6, z can be expressed as z = VaR β ( X ) . So substituting β = F X ( z ) , z = VaR β ( X ) ,and adjusting the limits of the integral ( F X ( VaR α ( X )) = α and F X (∞) =

1) yieldsCVaR α ( X ) = − α ∫ α VaR β ( X ) dβ , which completes the proof. 11 hapter 3 Portfolio Optimization Using CVaR

While Chapter 2 introduced the CVaR concept for univariate random distributions, the conceptcan be extended to multivariate random distributions or random vectors as well. This will bedone here with a focus on portfolio optimization, i.e. investment decisions where the investoris able to invest his funds in more than one asset. First, Section 3.1 gives an introduction intoportfolio optimization by presenting the ﬁrst model that has been developed to improve decisionmaking for portfolio investments [23], namely the

Markowitz or Mean Variance Model . Then,Section 3.2 introduces the

CVaR Model that has been developed by Rockafellar and Uryasevin [29]. It will also be explained why the CVaR Model is preferable to the Markowitz Modelwith regards to risk management. And ﬁnally, numerical examples will be given in Section 3.3to show how the two models can be applied in practice.Before beginning with the ﬁrst section, some notation will be established for the conceptsthat are used throughout this chapter and the rest of the dissertation.First of all, the investor can invest in N diﬀerent assets. His investment decision can berepresented mathematically by a decision vector x ∈ S ⊆ R N . Here, S represents the feasible setfor investment decisions. To deﬁne the set of admissible portfolios S for this chapter, the investor only has two con-straints: He cannot short sell any assets and his decision needs to satisfy the unit budgetconstraint. With these considerations, the set of admissible portfolios S which consists of N assets can be as S = { x ∈ R N ∶ x i ≥ ∀ i ∈ { , , . . . , N } , N ∑ i = x i = } . (3.1)Also, the returns of each asset are random. Therefore, the losses can be expressed by arandom loss vector r ∈ R N , so that r i is a random variable that is distributed according to theloss distribution of the i th asset. Note that r i and r j for i /= j do not need to have the samedistribution. Furthermore, r i and r j can be correlated (and in most cases are), which is whyportfolio optimization is concerned with multivariate loss distributions.So the loss X that an investor can experience is a random variable that depends on the(random) losses of each asset and also on the investment in each asset, so that X = X ( x , r ) .For the following considerations, the investor demands a minimum expected return. Taking r as the vector of random losses, x the vector of investment decisions, and labelling the minimum For example, S could have the unit budget constraint ∑ i x i =

1, or a concentration risk constraint x j ≤ . ∑ i x i ∀ j ≤ N . In the case of the budget unit constraint, x = . Here, the losses are the negative values of returns. Hence, a negative r i means that asset i is giving theinvestor a proﬁt. R , the minimum expected return constraint can be formulated as x T ̂ r ≤ − R , (3.2)where ̂ r = E [ r ] . Before modern portfolio theory was introduced by Markowitz in 1952 ([23]), investment decisionswere mostly made by an investor’s belief. Although the expected return and variance of a singleasset could be calculated, investors were not able to form optimal portfolios, i.e. assign theirfunds in such a way that the whole portfolio had preferable characteristics [33].The most important contribution of [23] is that it is favourable to diversify a portfolio becausethis will reduce the portfolio’s standard deviation (risk) as long as the correlation between assetsis less than 1. This result can be shown by a portfolio of N assets [33, p. 32].Assume that an investor can buy N assets, with expected returns ̂ r , . . . , ̂ r N and variance σ , . . . , σ N . Assigning x i of his funds to the i th asset, the investor can expect a return of E [ x T r ] = N ∑ i = x i × ̂ r i , which is the weighted average of expected asset returns. However, the risk for the investor canbe lower than the weighted average of asset risks. To show this, the covariance matrix Σ ∈ R N × N of the random loss vector r will be introduced. Σ is deﬁned as [27, week 3, p. 11] Σ ∶= ⎡⎢⎢⎢⎢⎢⎢⎢⎣ Var ( r ) Cov ( r , r ) ⋯ Cov ( r , r N ) Cov ( r , r ) Var ( r ) ⋯ Cov ( r , r N )⋮ ⋮ ⋱ ⋮ Cov ( r N , r ) Cov ( r N , r ) ⋯ Var ( r N ) ⎤⎥⎥⎥⎥⎥⎥⎥⎦ , where Var ( r i ) = σ i was deﬁned in Equation 2.3. Using Equation 2.5, Cov ( r i , r j ) can be expressedas Cov ( r i , r j ) = ρ ij σ i σ j , which leads to the expression below. This expression is a standard result in ﬁnancial literaturebut has been derived independently by the author: σ ( x T r ) = √ Var ( x T r ) =√ x T Σx =¿```(cid:192) N ∑ i = x i σ i + N − ∑ i = N ∑ j = i + ρ ij x i x j σ i σ j =¿```(cid:192) N ∑ i = x i σ i + N − ∑ i = N ∑ j = i + x i x j σ i σ j − N − ∑ i = N ∑ j = i + ( − ρ ij ) x i x j σ i σ j =¿```(cid:192)( N ∑ i = x i σ i ) − N − ∑ i = N ∑ j = i + ( − ρ ij ) x i x j σ i σ j ≤¿```(cid:192)( N ∑ i = x i σ i ) = N ∑ i = x i σ i , Even after Markowitz’s paper was published it took several decades to be adapted by the ﬁnancial industrybecause computers did not have the necessary power to perform the calculations. In the standard ﬁnancial literature, e.g. [8], this result is usually derived for N = N > x ∈ S . The above inequality is strict whenever ρ ij < i /= j , meaning that the portfoliorisk (given by the standard deviation) is less than the weighted average of asset risks wheneverthe assets are not perfectly correlated (which is usually the case).Using Markowitz’s ﬁndings, a quadratic programme can be formulated to ﬁnd a minimumvariance portfolio. Including the constraint given by Equation 3.2, the programme can give theinvestor a portfolio which oﬀers the required minimum return at the lowest possible risk. Theinputs for the model are ̂ r , the expected returns of assets 1 , . . . , N and Σ , the covariance matrix.Usually these inputs have to be estimated and one possibility of estimating the entries of thecovariance matrix is given in Section 4.2 but a further discussion on parameter estimation isbeyond the scope of this dissertation. Deﬁnition 3.1 ([27, week 3, p. 15] Minimum Variance Portfolio) . A minimum variance portfolioin the sense of [23] is a portfolio which can be formed by solving min x x T Σx s.t. x T ̂ r ≤ − R x ∈ S ⎫⎪⎪⎪⎪⎬⎪⎪⎪⎪⎭ , (3.3) where Σ is the covariance matrix of the random loss vector r , ̂ r = E [ r ] , and S is the set ofadmissible portfolios. Since a covariance matrix Σ is always positive deﬁnite [27, week 3, p. 13], Problem 3.3 is aconvex optimization problem. It has therefore either a unique solution or is infeasible. The onlysituation under which Problem 3.3 becomes infeasible is when the required expected return ishigher than any single expected return of the N assets under consideration.To see how the portfolio risk changes for diﬀerent expected returns, one can solve Problem 3.3for diﬀerent values of R (expected minimum return) and calculate the resulting portfolio risk(standard deviation). These risk/return pairs can be used to draw the eﬃcient frontier , which is“a graph of the lowest possible [risk] that can be attained for a given portfolio expected return”[8, p. 220].For a sample portfolio of three assets with expected returns and covariance matrix ̂ r = ⎡⎢⎢⎢⎢⎢⎣− . − . − . ⎤⎥⎥⎥⎥⎥⎦ and Σ = ⎡⎢⎢⎢⎢⎢⎣ . . . . . − . . − . . ⎤⎥⎥⎥⎥⎥⎦ , the eﬃcient frontier is shown in Figure 3.1.Figure 3.1: Eﬃcient frontier for a sample portfolio.14ecause of the quadratic term in the objective function of Problem 3.3, an investor canincrease his expected portfolio return with little additional risk if the portfolio has a low standarddeviation to begin with. For example, increasing the expected return from 6.5 to 7 % onlyincreases the standard deviation by 0.6 %. However, the more expected return an investordemands, the higher the increase in risk. Increasing the expected return from 9.5 to 10 %requires an additional risk of 1.7 %.It is possible to form a portfolio with a risk/return proﬁle that lies below the eﬃcient frontier.However, it is not possible to form a portfolio whose risk/return proﬁle is above or to the left ofthe eﬃcient frontier in Figure 3.1 [8, p. 220]. Despite revolutionizing risk management at its time, the Markowitz Model has some drawbacksregarding risk management. Two important disadvantages arise because it measures the risk interms of variance of the portfolio:1. Variance is only a useful risk measure for normally (or symmetrically) distributed losses.Since variance is measured in either direction, tail losses arising from skewed loss distri-butions are not taken in account.2. Variance is not a coherent risk measure as it is not monotone.The ﬁrst argument is illustrated in the second scenario of Section 3.3, while the second ar-gument can easily be shown by an example: Consider two random variables (both representingloss) which are normally distributed, but with diﬀerent µ and σ : X ∼ N ( µ X = , σ X = ) and Y ∼ N ( µ Y = , σ Y = ) . The probability that X is bigger than Y is insigniﬁcantly small. To beprecise, P ( Y ≤ X ) = . × − . Hence, it is nearly impossible that the loss of X will exceed theloss of Y . However, X has a higher variance than Y , i.e. Var ( X ) = ≥ Var ( Y ) =

1, and wouldtherefore be considered riskier if the risk were measured by the variance.Because of this, it is preferable for a risk manager to optimize the portfolio with regards toCVaR than with regards to variance. Rockafellar and Uryasev proposed a linear programme in[29] to optimize the CVaR of a portfolio. They also proved that under certain conditions theCVaR optimization will give the same optimal portfolio as the minimum variance optimization.The rest of this section introduces their notation and presents their results. To derive later results, Rockafellar and Uryasev labelled the cumulative distribution functionof losses Ψ ( x , c ) , so that for any given decision x ∈ S , random asset losses r ∈ R n , and lossdistribution X ( x , r ) ,Ψ ( x , c ) = F X ( c ) = P ( X ( x , r ) ≤ c ) in the general case, and (3.4)Ψ ( x , c ) = F X ( c ) = ∫ r ∶ X ( x , r )≤ c p ( r ) d r in the continuous case, (3.5)where p ( r ) in Equation 3.5 is the pdf for a continuous r . The function Ψ ( x , c ) can be interpretedas the probability that the losses do not exceed threshold c .Continuing with the notation of Ψ ( x , c ) as the threshold of losses, VaR α and CVaR α of aninvestment decision x can be then written asVaR α ( x ) = VaR α ( X ( x , r )) = min { c ∶ Ψ ( x , c ) ≥ α } , and (3.6)CVaR α ( x ) = CVaR α ( X ( x , r )) = E r [ X ( x , r ) ∣ X ( x , r ) ≥ VaR α ( x )] . (3.7) Although this section follows the outline of [29], the expressions are closer aligned with [27, week 8]. φ α ( x , c ) ∶= c + − α E [( X ( x , r ) − c ) + ] , (3.8)where E [⋅] is the expectation and ( t ) + = max { , t } . Based on Equation 3.8, they formulatedTheorem 3.1, the most important result of [29]. Theorem 3.1 ([29, p. 24]) . As a function of c , φ α ( x , c ) is convex and continuously diﬀeren-tiable. The CVaR α of the loss associated with any x ∈ S can be determined from the formula CVaR α ( x ) = min c ∈ R φ α ( x , c ) . (3.9) Furthermore, let Φ ∗ α ( x ) ∶= arg min c φ α ( x , c ) , i.e. Φ ∗ α ( x ) is the set of minimizers of φ α ( x , c ) .Then VaR α ( x ) = min { c ∶ c ∈ Φ ∗ α ( x )} . (3.10) And following from Equation 3.9 and Equation 3.10, the following equation always holds:

CVaR α ( x ) = φ α ( x , VaR α ( x )) . (3.11)The proof of Theorem 3.1 is given in the appendix of [29]. Based on Theorem 3.1, Rockafellarand Uryasev stated another theorem, which is useful for the computational calculation to ﬁnda CVaR optimal portfolio x ∗ ∈ S . Theorem 3.2 ([29, p. 25 f.]) . Let S be a convex set of feasible decisions x and assume that X ( x , r ) is convex in x . Then minimizing the CVaR α of the loss associated with decision x ∈ S is equivalent to minimizing φ α ( x , c ) over all ( x , c ) ∈ S × R , in the sense that min x ∈ S CVaR α ( x ) = min ( x ,c )∈ S × R φ α ( x , c ) , (3.12) where, moreover, a pair ( x ∗ , c ∗ ) achieves the right hand side minimum if and only if x ∗ achievesthe left hand side minimum and c ∗ ∈ Φ ∗ α ( x ) . Therefore, in circumstances where the interval Φ ∗ α ( x ) reduces to a single point (as is typical), the minimization of φ α ( x , c ) produces a pair ( x ∗ , c ∗ ) such that x ∗ minimizes the CVaR α and c ∗ gives the corresponding VaR α . Theorem 3.2 not only gives a way to express the CVaR minimization problem in a tractableform, but also allows to calculate CVaR α without having to calculate VaR α ﬁrst, as would havebeen the case with Deﬁnition 2.6. More remarkably, ﬁnding the CVaR by using Theorem 3.2,gives the corresponding VaR as a by-product [29, p. 25 f.].Applying Theorem 3.2 with Equation 3.8, the investment decision x that minimizes theConditional Value-at-Risk of a portfolio at the conﬁdence level α can be expressed as [27, week8, p. 21] min x ∈ S CVaR α ( x ) = min x ∈ S,c ∈ R ( c + − α E [( X ( x , r ) − c ) + ]) . (3.13)To provide a better understanding of how to solve Problem 3.13, a one-dimensional examplewill be given, i.e. there is only asset with a univariate, discrete loss distribution. Since there isonly one asset to consider, x = [ ] . Because of this, it is not the goal in this example to ﬁndthe optimal portfolio composition, but rather to ﬁnd the VaR and CVaR using Theorem 3.2.The asset has the loss distribution of Y given in Table 2.2. The table is reproduced below forconvenience. i 1 2 3 4 5 6 y i

100 200 400 800 900 1000 P ( Y = y i ) φ α ( x , c ) = c + − α E [( X ( x , r ) − c ) + ] will be drawn against c to ﬁndCVaR α ( x ) = min c ∈ R φ α ( x , c ) graphically. The graph of φ α ( x , c ) for α = .

95 is shown in Figure 3.2.Figure 3.2: Function value φ . ( c ) of Y for diﬀerent values of c .The graph shows that the minimum of φ α ( x , c ) occurs at c ∗ = c ∈ R φ α ( x , c ) = φ α ( x , ) = . =

800 and CVaR . = φ α ( x , c ) has “kinks” at points y i , i = , . . . , X is continuous. One remedyis to use Monte Carlo Sampling to draw K i.i.d. samples of the loss vector r ( r k , k ∈ { , , . . . , K } )from the distribution of r , so that Problem 3.13 can be written in a tractable LP form [27, week8, p. 29]. Adding constraint 3.2 to ensure a minimum expected return for the investor, thetractable LP form of the optimization problem is given asmin c, z c + K ( − α ) K ∑ k = z k s.t. z k ≥ x T r k − c for k ∈ { , . . . , K } z k ≥ k ∈ { , . . . , K } x T ̂ r ≤ − R x ∈ S ⎫⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭ . (3.14)Another interesting link between mean variance and CVaR optimization was established in[29] as well. Rockafellar and Uryasev proposed that under certain conditions, Problem 3.3 andProblem 3.13 give the same optimal portfolio. Proposition 3.1 ([29, p. 29]) . Suppose that the loss associated with each x is normally dis-tributed as holds when r is normally distributed. If α ≥ . and the constraint 3.2 is active atsolutions to Problem 3.3 and Problem 3.12, then the solutions to those problems are the same;a common portfolio x ∗ is optimal by both criteria. This means that under the conditions stated in the proposition, it is possible to ﬁnd theminimum variance portfolio by ﬁnding the minimum CVaR portfolio. Proposition 3.1 will beexplored in the ﬁrst scenario of Section 3.3.

This section gives numerical examples for ﬁnding minimum CVaR portfolios. More precisely, theCVaR criterion will be compared to the minimum variance criterion (as formulated by Markowitz17n [23], see Deﬁnition 3.1) and two scenarios will be given to show the eﬀect of the criterion on theportfolio composition. The ﬁrst scenario is adapted from [29] and concerns normally distributedlosses. The second scenario is a theoretical construct with a positively skewed loss distribution.

First Scenario: Normally Distributed Losses

This scenario serves to display the proposition by Rockafellar and Uryasev that for certainconditions the minimum variance optimization and CVaR optimization give the same optimalportfolio x ∗ :In the example from [29, p. 29 ﬀ.], three assets ( N =

3) are available: The S&P 500 index ( x ),long-term US government bonds ( x ), and a portfolio of small cap stocks ( x ). The expectedreturn of each asset and their covariance matrix is given in Table 3.1 and Table 3.2, respectively. Asset Mean Loss x S&P 500 - 0.0101110 x Gov. bond - 0.0043532 x Small Cap - 0.0137058Table 3.1: Mean Asset Losses of S&P, Government Bonds, and Small Cap.

Covariance x x x Matrix

S&P 500 Gov. bond Small Cap x S&P 500 0.00324625 0.00022983 0.00420395 x Gov. bond 0.00022983 0.00049937 0.00019247 x Small Cap 0.00420395 0.00019247 0.00764097Table 3.2: Covariance Matrix of S&P, Government Bonds, and Small Cap.Using the CVX package in MATLAB, the minimum variance portfolios (MV opt) and mini-mum CVaR portfolios (CVaR opt) are calculated for expected minimum returns of 0.6%, 0.9%,and 1.1%. To calculate the minimum CVaR portfolio for α = .

95, 100,000 Monte Carlo simula-tions were run to estimate the loss distribution. The results are given in Table 3.3.

Requiredreturn 0.6 % 0.9 % 1.1 %

Portfolio: MV opt CVaR . opt MV opt CVaR . opt MV opt CVaR . optS & P 17.54 % 17.28 % 34.19 % 34.82 % 45.15 % 46.20 %Gov. Bonds 75.65 % 75.75 % 37.18 % 36.93 % 11.58 % 11.52 %Small Cap 6.81 % 6.97 % 28.64 % 28.25 % 43.27 % 43.18 %Table 3.3: Minimum Variance and Minimum CVaR portfolios for diﬀerent required returns.Comparing the two portfolios for diﬀerent levels of required return, one can see that theircompositions only vary slightly (although they should be identical). The reason they are notcompletely identical is because the minimum variance portfolio was computed analytically, whileMonte Carlo simulations were used to calculate the CVaR optimal portfolio. Otherwise, theycan be considered identical, as was stated in Proposition 3.1. Second Scenario: Positively Skewed Loss Distribution

In this subsection, the eﬀect of the portfolio selection criterion is analysed when the loss dis-tributions are not normal. Therefore, two further characteristics are needed to describe theirdistribution They are named skewness and kurtosis , respectively:18 eﬁnition 3.2 ([22, p. 22] Skewness) . The skewness of a random variable X is deﬁned as skew ( X ) ∶= E [( X − µσ ) ] . (3.15) Deﬁnition 3.3 ([22, p. 22] Kurtosis) . The kurtosis of a random variable X is deﬁned as kurt ( X ) ∶= E [( X − µσ ) ] . (3.16)A skewness of 0 means that the distribution of X is symmetrical about its mean µ , whilea negative skewness indicates that values of X below µ are more likely and a positive skewnessmeans that values of X greater than µ are more probable. Kurtosis measure how the varianceis aﬀected by extreme deviations from the mean. A high kurtosis shows that a high variance iscaused by few extreme deviations from the mean µ [22, p. 22 f.].In this scenario, four assets will be considered (called Index, Bonds, Mid Cap, EmergingMarkets Stocks) and the following assumptions will be made: • The loss distributions of the four assets are independent of each other, i.e. their correlationsare 0. • The loss distributions of the ﬁrst three assets have the same mean and variance as in theprevious scenario. The fourth assets has higher mean and variance than the previous three. • The minimum variance and minimum CVaR portfolios are formed the same way as in theprevious scenario. • Two cases will be considered: In the ﬁrst case, all single loss distributions are normal, i.e.they have skewness 0. In the second case, all loss distributions are positively skewed, i.e.high losses are more likely than high proﬁts.The ﬁrst assumption is highly theoretical, as in any real world setting there exists at leastsome correlation. However, uncorrelated assets are very favourable in portfolio diversiﬁcation asthis reduces the combined variance signiﬁcantly. The second and third assumption create a linkbetween this scenario and the previous one. Hence, the eﬀects can be better compared. Finally,the fourth assumption should show the dangers of using minimum variance optimization in thecases where losses are not normally distributed. The ﬁrst case (in which losses are normallydistributed) serves as a benchmark portfolio for the second case with skewed loss distributions.The loss distributions will be characterized by their mean, variance, skewness, and kurtosis(see Table 3.4). The implementation of these random losses in MATLAB will be done withthe function pearsrnd and the loss distributions for the single assets in both cases are shown inAppendix C.1.

Distribution skewness

Parameters µ σ case 1 case 2 kurtosis x Index - 0.0101110 0.00324625 0 0.7 3 x Bonds - 0.0043532 0.00049937 0 0.7 3 x Mid Cap - 0.0137058 0.00764097 0 0.7 3 x EMS -0.018 0.01 0 0.7 3Table 3.4: Characterization of loss distributions used in second scenario.For all simulations and both cases, a minimum return of − .

006 was required. For both cases(no skewness and skewness = 0.7), the minimum variance optimal portfolio is the same, whilethe minimum CVaR portfolio diﬀers: In both cases, even with normally distributed losses, it isdiﬀerent from the minimum variance portfolio. In the ﬁrst case the portfolio is diﬀerent because Some texts subtract 3 from the fourth central (normalized) moment when they deﬁne the kurtosis - so thatthe normal distribution has a kurtosis of 0. This convention is not followed in this dissertation.

Case 1, skewness = 0 2, skewness = 0.7

Portfolio: MV opt CVaR . opt MV opt CVaR . optIndex 12.12 % 13.34 % 12.12 % 14.36 %Bonds 78.80 % 75.36 % 78.80 % 72.87 %Mid Cap 5.15 % 6.15 % 5.15 % 6.95 %EMS 3.93 % 5.15 % 3.93 % 5.82 %Table 3.5: Minimum Variance and Minimum CVaR portfolios for scenario 2.Although the loss distributions for both optimal portfolios are very similar in both cases(see Appendix C.2), the CVaR optimal portfolio shows a better performance for the 100,000simulations. Among other performance and risk measures, Expected Loss (EL) will also beconsidered. The deﬁnition of EL is given below.

Deﬁnition 3.4 ([15, p. 23] Expected Loss (EL)) . Let X be a random variable representing loss.The expected loss of X is deﬁned as EL ( X ) = E [ X ∣ X ≥ ] . (3.17)Hence, the expected loss is the average loss, given that there is a loss. In this sense ELis similar to CVaR but with the diﬀerence that the condition for the expectation is diﬀerent.A summary of several performance and risk indicators for both optimal portfolios is given inTable 3.6. Case 1, skewness = 0 2, skewness = 0.7

Portfolio: MV opt CVaR . opt MV opt CVaR . optExpected Return µ -0.0061 -0.0064 -0.0061 -0.0064Standard Deviation σ . . hapter 4 Portfolio Hedging using CVaR

Chapter 2 stated the deﬁnition of CVaR, explained its properties and Section 3.2 gave a compu-tationally tractable optimization programme to calculate CVaR optimal investment portfolios,for which corresponding examples were given in Section 3.3. In [29, p. 32 ﬀ.], Rockafellar andUryasev (later followed by other authors, e.g. [3], [5], [31], and [34]) expanded the use of CVaRto hedge against potential losses that arise from a previous investment decision. A possiblescenario for this application is when a trader entered a position only looking at potential gainsbut disregarding possible losses. The risk manager might then intervene to hedge against thepotential losses, i.e. minimizing the trader’s risk while still maintaining acceptable potentialgains.This chapter will start by introducing the basic notions of options and ﬁnancial risk manage-ment methods in Section 4.1 and Section 4.2, followed by applying the hedging procedure thatRockafellar and Uryasev used to call and put options on Google and Yahoo traded on 21 July2015. Based on the available data as of 21 July 2015, two strangles are formed and describedin Section 4.3, while the subsequent hedging procedure is described and applied in Section 4.4.

In Chapter 3, investments in an index fund, bonds and equity were considered when formingthe portfolio. These securities are basic investment possibilities, which are easy to understandas their payoﬀ is directly linked to their market value. This means that if the price of a commonshare of Google rises (or falls) by 1 %, an investor who invested all his funds into Google sharesmakes a proﬁt (or loss) of 1 % as well.Derivatives, such as call and put options, are “securities whose prices are determined by,or ’derive [sic] from,’ the prices of other securities” [8, p. 678]. Since these prices do not needto depend linearly on the price of the underlying, their payoﬀ proﬁle can be more complicatedthan the payoﬀ of bonds or equity. Deﬁnition 4.1 ([8, p. 679] Call Option) . A call option gives its holder the right to purchase an asset for a speciﬁed price, called strike price , on the speciﬁed expiration date. Deﬁnition 4.2 ([8, p. 690] Put Option) . A put option gives its holder the right to sell an assetfor a speciﬁed price, called strike price , on the speciﬁed expiration date. For stock options, one option contact gives the holder to the right to buy (call option) or sell(put option) 100 shares at the speciﬁed priced [21, p. 199]. For any type of option, four basic The example used was taken from [24, p. 172 ﬀ.]. The ticker symbols for the underlying equity are NASDAQ:GOOGL and NASDAQ:YHOO. Other derivative securities are for example futures or swaps. For more information on those and otherderivatives please refer to [21]. This is known as a

European option. American options can be exercised at any time before the expirationdate. In the following example, only stock options will be considered K the strike price, S T the price of the underlying stock at maturity, and p C theprice of the call, the payoﬀ and proﬁt of a long position in a call option can be expressed as [21,p. 198] Payoﬀ Long Call = max { S T − K, } (4.1)Proﬁt Long Call = max { S T − K, } − p C (4.2)The payoﬀ and proﬁt for a short position are the negatives of Equation 4.1 and Equation 4.2and can be expressed as [21, p. 198]Payoﬀ Short Call = min { K − S T , } (4.3)Proﬁt Short Call = min { K − S T , } + p C (4.4)Figure 4.2: Reproduced from [21, p. 198], payoﬀ and proﬁt proﬁle for a put option.Using the same expressions as before and denoting the price of the put as p P , the payoﬀ and22roﬁt for a long put position can be expressed as [21, p. 198]Payoﬀ Long Put = max { K − S T , } (4.5)Proﬁt Long Put = max { K − S T , } − p P (4.6)while the payoﬀ and proﬁt for a short put are [21, p. 198]Payoﬀ Short Put = min { S T − K, } (4.7)Proﬁt Short Put = min { S T − K, } + p P (4.8)Hence, the bounds for proﬁts and losses are quite diﬀerent between call and put options.While a trader has no upper bound on possible proﬁts from a long call, the losses for a shortcall are unbounded as well. On the hand, proﬁts and losses are bounded for both positions, longand short, in put options.As mentioned previously, the four basic positions can be combined in a variety of ways tocreate many diﬀerent payoﬀ proﬁles. In this dissertation, only a strangle will be considered.

Deﬁnition 4.3 ([21, p. 248] Sale of a Strangle) . In the sale of a strangle , sometimes called a top vertical combination , the investors sells a European put and a European call option with thesame expiration date, but diﬀerent strike prices ( K Put < K Call ). The payoﬀ and proﬁt proﬁle from the sale of a strangle is shown in Figure 4.3. It is an easyto construct strategy and suitable for investors who feel that large stock price movements areunlikely. The proﬁt from the sale of strangle is constant if the stock price at maturity is betweenthe two strike prices, i.e. K Put ≤ S T ≤ K Call . However potential losses are unlimited if the stockprice rises above K Call because of the short call position [21, p. 248].Figure 4.3: Reproduced from [21, p. 249], payoﬀ and proﬁt proﬁle for the sale of a strangle.

When managing the risk of an option trader’s portfolio, it is crucial to have the most up to dateestimates for the variance (or standard deviation / volatility ) and covariance of the underlyingstock’s price movements. Just prices constantly change, so does the volatility of the pricechanges. In periods of economic stability, huge price ﬂuctuations are unlikely so the volatility islow - while in times of uncertainty price ﬂuctuations are more common.Hence, it might be unsuitable to estimate the variance and covariance using Deﬁnition 2.2 andDeﬁnition 2.3 with the entire historic data. To estimate the market risk , practitioners tend touse running averages or exponentially weighted moving averages to estimate the current volatility For a more detailed description of option trading strategy, please refer to [21, p. 234 ﬀ.]. Volatility is just another term for standard deviation that is commonly used in ﬁnance. Market risk is the risk that is caused by the uncertainty of price changes.

23f an asset because this places more importance on recent observations of price ﬂuctuations [33,p. 16].This section describes how to calculated the daily EWMA estimates for the variance andcovariance and how to scale the variance if the holding period of a portfolio is longer than oneday. The following variables will be used in the deﬁnitions: t : the day of the estimation r x,t : the natural log of the daily return of an asset x from t − t , i.e. ln ( Price x,t − Price x,t − Price x,t − ) The natural log of returns is used instead of the regular returns, because the distribution oflog returns is better ﬁtted by the normal distribution than the regular return. And at the sametime, log returns usually have a correlation with regular returns of close to 1 [33, p. 12].

Deﬁnition 4.4 ([33, p. 16] EWMA of Variance) . The daily variance of the returns of an asset x using an exponentially weighted moving average with parameter λ is estimated by the formula Var t ( x ) ∶= λ Var t − ( x ) + ( − λ ) r x,t − . (4.9)Hence, the variance of any given day is estimated by using the variance estimate of theprevious day and the natural log of observed returns of the previous day. To apply Equation 4.9,two parameters must be set: the variance estimate of day 0 and λ . If the estimates have beencalculated for a long enough horizon, Var ( x ) is of little importance so it can be set equal to0. In practice, risk managers usually set λ = .

94, as this provides a good balance between thevolatility estimates of recent and historic data [33, p. 16 ﬀ.].

Deﬁnition 4.5 ([33, p. 25] EWMA of Covariance) . The daily covariance between the returnsof an asset x and an asset y using an exponentially weighted moving average with parameter λ is estimated by the formula Cov t ( x, y ) ∶= λ Cov t − ( x, y ) + ( − λ ) r x,t − r y,t − . (4.10)Again, two parameters must be set to apply Equation 4.10: Cov ( x, y ) and λ . Using thesame arguments as before, they should be set to Cov ( x, y ) = λ = .

94 [33, p. 25].If the portfolio is held for longer than one day, the variance and covariance estimates needto be scaled to estimate the risk over the entire holding period. Assuming that returns followa random walk, the variance and covariance over a n day holding period (denoted Var nt ( x ) andCov nt ( x ) , respectively) are given as [33, p. 13]Var nt ( x ) = n × Var t ( x ) , and (4.11)Cov nt ( x, y ) = n × Cov t ( x, y ) . (4.12) As described in the introduction, one scenario where CVaR hedging can be used is the adjustmentof a trader’s portfolio to protect the trading ﬁrm against unlikely, but very high losses. For thisscenario the following set-up is given and the following assumptions are made: • The date and time is 22 July 2015, 9 PM New York time (before US markets open). • The trader only trades in call and put options on Google (NASDAQ:GOOGL) and Yahoo(NASDAQ:YHOO) which are expiring on 24 July 2015. • The trader builds his position and does not change until the option contract expire, i.e.the holding time is 3 trading days. • Only options with strike prices for which the open interest is greater than 200 will beconsidered. 24

There is no bid-ask spread, i.e. options can be bought and sold at the same price. • There are no transaction costs. • All data is taken from Google Finance UK. • The trader believes that high price movements are unlikely, he will build a pure stranglewith Google options and a strangle with additional positions with Yahoo options. Theadditional positions on Yahoo are because the trader believes that an upward movementof Yahoo’s share price is more likely than a downward movement.To be more precise, the trader believes that at the market closing on 24 July 2014, theshare price of Yahoo will be between USD 37.5 and 42.5, while the share price of Google willbe between USD 665 and 730. Based on the trader’s positions, the payoﬀ and proﬁt proﬁlefor diﬀerent prices of Yahoo and Google at maturity is shown in Figure 4.4. More detailedinformation about option prices is given in Appendix B.1 and Appendix B.2, while the trader’spositions are given in Appendix B.3.Figure 4.4: Proﬁt proﬁles for (unhedged) Google and Yahoo strangles at maturity.Hence, if Google’s share price closes within the trader’s expectations on 24 July, the traderwill make a constant proﬁt. If Yahoo’s share price closes within the expectations, the trader willalso make a proﬁt, but the proﬁt will be highest if the share price closes at USD 42. However,the trader will suﬀer severe losses if the share prices close outside of his expectation, as can beseen at the left and right edges of the proﬁt proﬁles in Figure 4.4.

To perform the risk assessment of the trader’s positions, the variance and covariance of Yahoo’sand Google’s share price movements need to be estimated. Using the daily share price movementsover the last year, together with Equation 4.9 and Equation 4.10 gives the following covariancematrix for daily price movements:Σ = [ . . . . ] , Usually, the price to buy (ask) is higher than the price to sell (bid). Here, the price of an option is the averagebetween ask and bid price. As noted before, λ is chosen to be 0.94 and the initial estimates for the variance and covariance are 0 , is the variance for Yahoo’s and Σ , is the variance for Google’s share price move-ments.Since the trader will hold the portfolio for 3 days, Σ needs to be multiplied by 3 to givethe variance and covariance estimates for the whole holding period (see Equation 4.11 andEquation 4.12). This gives the following covariance matrix for all subsequent risk assessments:Σ = [ . . . . ] . (4.13)The remainder of this section mostly follows the hedging procedure used by Rockafellar andUryasev in [29]. However, the optimization programme used to determine the CVaR optimalhedge was never stated in [29], so the explicit formulation of Problem 4.14 (together with Ta-ble 4.1) is an original contribution of this thesis.With the initial prices of Yahoo and Google at USD 39.73 and 695.35, respectively, on themorning of July 22 and the variance estimates given in Σ, one can calculate the probability thatthe share prices will be outside the trader’s beliefs. Denoting the share prices at maturity of theoptions as S T,y and S T,g , these probabilities can be expressed as P ( S T,y < . ) + P ( S T,y > . ) = .

016 , and P ( S T,g < ) + P ( S T,g > ) = . . Hence, there is a high probability that the trader will be correct in his assumption. Taking therisk analysis a little further, 20,000 simulations of share price developments were run (takinginto account the correlation between Yahoo and Google share price movements). For each ofthe 20,000 scenarios the trader’s loss was calculated. The loss distribution of the simulations isshown in Figure 4.5 and several risk metrics are given in Table 4.2.Figure 4.5: Histogram of trader’s (unhedged) portfolio losses from 20,000 simulations.Only in very few simulations (2.6 %) the trader actually makes a loss. Quantifying theValue-at-Risk also gives a positive assessment of the positions, as VaR . = − , A higher number of simulations could not be performed as the PC ran out of memory for a CVX programmewith more than 20,000 simulations. Variable Dimension Description N y , N g k y N y × k g N g × p C,y , p P,y N y × p C,g , p P,g N g × x C,y , x P,y N y × x C,g , x P,g N g × y C,y , y P,y N y × y C,g , y P,g N g × a C,y , a P,y N y × a C,g , a P,g N y × M S M × PO C,y , PO P,y M × N y The payoﬀ for call / put options in Yahoo, by simulatedshare price and strike price of the option PO C,g , PO P,g M × N g The payoﬀ for call / put options in Google, by simulatedshare price and strike price of the optioncost y , cost g spc spc = y ) cannot be arbitrarily large, and the maximum possible adjustment foreach position is given by the a vectors. [29, p. 33 f.]Also, the payoﬀs PO can be calculated before running the optimization programme (butafter the scenarios were simulated). Their entries are P O

C,yi,j = max { S i, − k yj , } for i ∈ { , . . . , M } , j ∈ { , . . . , N y } ,P O P,yi,j = max { k yj − S i, , } for i ∈ { , . . . , M } , j ∈ { , . . . , N y } ,P O C,gi,j = max { S i, − k gj , } for i ∈ { , . . . , M } , j ∈ { , . . . , N g } , and P O

P,gi,j = max { k gj − S i, , } for i ∈ { , . . . , M } , j ∈ { , . . . , N g } . Note that the trader’s positions (denoted x ) are now given in number of contracts instead of percentages(which was done in Chapter 3). c, z c + M ( − α ) M ∑ m = z m s.t. − a C,yi ≤ y C,yi ≤ a C,yi for i ∈ { , . . . , N y }− a P,yi ≤ y P,yi ≤ a P,yi for i ∈ { , . . . , N y }− a C,gi ≤ y C,gi ≤ a C,gi for i ∈ { , . . . , N g }− a P,gi ≤ y P,gi ≤ a P,gi for i ∈ { , . . . , N g } PO y = [ PO C,y ( x C,y + y C,y )+ PO P,y ( x P,y + y P,y )] × spc PO g = [ PO C,g ( x C,g + y C,g )+ PO P,g ( x P,g + y P,g )] × spc adjCost y = [ N y ∑ i = p C,yi × y C,yi + N y ∑ i = p P,yi × y P,yi ] × spc adjCost g = [ N g ∑ i = p C,gi × y C,gi + N g ∑ i = p P,gi × y P,gi ] × spcz m ≥ adjCost y + adjCost g + cost y + cost g − [ P O ym + P O gm ] for m ∈ { , ., M } z m ≥ m ∈ { , ., M } ⎫⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭ . (4.14)Hedging the trader’s portfolio using Problem 4.14 with a P,yi = a C,yi =

50 for i ∈ { , . . . , N y } and a P,gi = a C,gi = i ∈ { , . . . , N g } yields the payoﬀ / proﬁt proﬁle shown in Figure 4.6 andthe loss distribution Figure 4.7. The exact composition of the hedged portfolio is shown inAppendix B.4 and Appendix B.5.Figure 4.6: Proﬁt proﬁles for hedged Google and Yahoo strangles at maturity.After hedging, the proﬁt proﬁle for Yahoo options only changed slightly. The most noticeablechange is that the graph is mostly scaled, that is, the proﬁt for any given share price is abouttwice as high as for the unhedged portfolio. Still, the highest proﬁt will be achieved when theshare price of Yahoo is at USD 42.The pure strangle that was formed by options on Google changed its shape more noticeably.28hile the proﬁt was mostly constant in the unhedged portfolio, there is now a clear peak at S T,g = Metric Original Portfolio Hedged Portfolio

Mean Loss -38,882 -54,910Min Loss -77,072 -142,556Max Loss 466,221 376,638Probability of Loss 2.62 % 0.48 %95 % VaR -31,441 -39,64895 % CVaR 22,458 -27,911Table 4.2: Risk metrics for the original and hedged option portfolio.As table Table 4.2 demonstrates, the hedged portfolio performs better than the original inany of the 6 metrics under consideration. The portfolio has a higher expected proﬁt and lowerprobability of generating a loss. Also, the 95 % VaR is lower (meaning that the minimum proﬁtin the 95 % best cases is higher than for the original portfolio). Most notably however, is thefact that the hedged portfolio has a negative 95 % CVaR. The means that even in the 5 % worstcases, the trader can expect a proﬁt of USD 27,911. Still, losses are possible as can be seen inFigure 4.7, but they are far less likely and less severe than for the original portfolio.To conclude this chapter, it needs to be emphasized that the given example (although relyingon real world data) is only demonstrating how to apply CVaR optimization when trying to hedgea portfolio. The hedging eﬀect shown here is astonishing, but can barely be reproduced in anactual trading environment for several reasons. First, the original portfolio was just an example,29t has not been optimized with regards to proﬁt maximization. For a more balanced portfolio,the eﬀects of hedging would be less extreme. Also, the prices were simpliﬁed, enabling to buyand sell at the same price, without any transaction costs. Introducing ask and bid prices, aswell as transaction costs would decrease the proﬁt and hence increase possible losses. Third, thetrader and risk manager could buy and sell unlimited quantities of any option. In reality theoﬀer and demand for any given option is limited. Finally, all other simplifying assumption wouldmake it hard to reproduce the same results in a real world setting, e.g. that the assumptionthat the trader holds the portfolio until the maturity of the options or that the volatility wouldremain constant over the holding period. 30 hapter 5

Conditional Value-at-Risk as a Norm

In the previous chapters, CVaR was introduced as a risk measure, which was the original inten-tion of CVaR. Applications to portfolio optimization and hedging were also explored. In morerecent research, Pavlikov and Uryasev ([25]) abstracted the concept of CVaR to a more generalinterpretation, so that it can also be used to deﬁne a family of norms in R n . Pavlikov andUryasev proposed two norms: a scaled CVaR norm (denoted C Sα ), and a non-scaled CVaR norm(denoted C α , later simply referred as CVaR Norm ), which only diﬀer by a factor.This chapter ﬁrst presents the two diﬀerent and equivalent deﬁnitions that Pavlikov andUryasev used to deﬁne the C Sα norm, and how the C Sα and C α norms are related to one anotherby a multiplying factor. Section 5.3 presents some of the norm properties that were identiﬁedby Pavlikov and Uryasev in [25], enriched by some original ideas of the author. Section 5.4introduces algorithms to computationally evaluate the diﬀerent CVaR norms ( C Sα and C α ).Algorithms are derived for both equivalent deﬁnition of each CVaR norm and the computationaleﬃciency of each algorithm is evaluated. The scaled CVaR norm of the vector x ∈ R n is denoted by ⟪ x ⟫ Sα , where α is a parameter inthe range 0 ≤ α ≤

1. The ﬁrst way to deﬁne ⟪ x ⟫ Sα is given in Subsection 5.1.1 below, while analternative characterization is given in Subsection 5.1.2. Deﬁnition 5.1 ([25, p. 3f.] Component-wise Scaled CVaR Norm) . Let the absolute values of thecomponents of vector x ∈ R n be ordered in ascending order, i.e., ∣ x ( ) ∣ ≤ ∣ x ( ) ∣ ≤ . . . ≤ ∣ x ( n ) ∣ .For α j = jn , j = , . . . , n − , the scaled CVaR norm ⟪ x ⟫ Sα of vector x with parameter α j is deﬁnedas ⟪ x ⟫ Sα j ∶= n − j n ∑ i = j + ∣ x ( i ) ∣ . (5.1) For α such that α j < α < α j + , j = , . . . , n − , the scaled CVaR norm ⟪ x ⟫ Sα equals the weightedaverage of ⟪ x ⟫ Sα j and ⟪ x ⟫ Sα j + , i.e., ⟪ x ⟫ Sα ∶= µ ⟪ x ⟫ Sα j + ( − µ )⟪ x ⟫ Sα j + , (5.2) where µ = ( α j + − α ) ( − α j )( α j + − α j ) ( − α ) . nd ﬁnally, for α such that n − n < α ≤ , ⟪ x ⟫ Sα ∶= max i ∣ x i ∣ . (5.3)To illustrate the scaled CVaR norm, ⟪ x ⟫ Sα will be calculated for a vector x ∈ R and the unitball of x ∈ R will be drawn, both for diﬀerent values of α . For x = [ , − , , − ] T , ⟪ x ⟫ S = (∣ ∣ + ∣ − ∣ + ∣ ∣ + ∣ − ∣) = . , ⟪ x ⟫ S . = (∣ − ∣ + ∣ ∣ + ∣ − ∣) = , ⟪ x ⟫ S . = (∣ ∣ + ∣ − ∣) = , and ⟪ x ⟫ S . = ∣ − ∣ = . Note that by Equation 5.3, ⟪ x ⟫ Sα =

14 for all α > .

75 as well. To calculate ⟪ x ⟫ S , µ must becalculated ﬁrst to use Equation 5.2. Since 0 . < µ < . µ = ( − ) ( − )( − ) ( − ) = . Hence, ⟪ x ⟫ S = µ ⟪ x ⟫ S . + ( − µ ) ⟪ x ⟫ S . = +

12, so that ⟪ x ⟫ S = . x ∈ R , the unit balls of ⟪ x ⟫ Sα for α ∈ { , . , . , . , . } are shown below in Figure 5.1.Figure 5.1: Unit balls of ⟪ x ⟫ Sα for x ∈ R and diﬀerent values of α . Alternatively, the vector x ∈ R n can be associated with a random variable X with the set ofpossible outcomes {∣ x ∣ , ∣ x ∣ , . . . , ∣ x n ∣} , each of which is equally likely. Then the scaled CVaRnorm can be derived from the CVaR deﬁnition itself (see Problem 3.13). That is, the scaledCVaR norm ⟪ x ⟫ Sα is equal to CVaR α ( X ) as deﬁned in Equation 3.9. Proposition 5.1 ([25, p. 6f.] Alternative Characterization of the Scaled CVaR Norm) . For very x ∈ R n , ≤ α < , and c ∈ R n , ⟪ x ⟫ Sα = min c ∈ R ( c + n ( − α ) n ∑ i = (∣ x i ∣ − c ) + ) , and (5.4) ⟪ x ⟫ S = max i ∣ x i ∣ . (5.5)Although Proposition 5.1 has been proven by Pavlikov and Uryasev in [25, p. 9ﬀ.], a novelproof will be presented here to show how the proof of Proposition 5.1 can be derived in a diﬀerentway. To the best knowledge of the author this novel proof has not been published before.In their proof, Pavlikov and Uryasev showed that for the function f ( c ) ∶= c + n ( − α ) ∑ ni = [∣ x i ∣ − c ] + it follows that ∣ x ( j + ) ∣ ∈ arg min c f ( c ) . They used this result together with Equation 5.4 to ma-nipulate the alternative characterization of the scaled CVaR norm so that it was equal to Deﬁ-nition 5.1. The novel proof has two steps. First, it will be shown that when interpreting x ∈ R n as the distribution of a discrete random variable X , the right hand side of both, Equation 5.4and Equation 5.5, are an expression for CVaR α ( X ) . In the second step, it will be shown thatCVaR α ( X ) can be expressed by the Convex Combination Formula (Equation 2.15) so that it isequivalent to ⟪ x ⟫ Sα in Deﬁnition 5.1. Proof.

Let x ∈ R n describe the distribution of a discrete random variable X , so that the possiblevalues of X are ∣ x i ∣ for i ∈ { , . . . , n } , with P ( X = ∣ x i ∣) = n . Then for 0 ≤ α <

1, the right handside of Equation 5.4 is equivalent tomin c ∈ R ( c + n ( − α ) n ∑ i = (∣ x i ∣ − c ) + )= min c ∈ R ( c + − α E [( X − c ) + ])= CVaR α ( X ) , where the last line follows from Problem 3.13. And by Equation 2.7, max i ∣ x i ∣ = CVaR ( X ) .To determine the α CVaR of X by the Convex Combination Formula (Equation 2.15), threecases need to be considered. The ﬁrst case is α = α j = jn , j ∈ { , , . . . , n − } , the second caseis α j < α < α j + , j ∈ { , , . . . , n − } , and the third and last case is n − n < α ≤

1. For all threecases the absolute values of the components of x should be ordered in ascending order, suchthat ∣ x ( ) ∣ ≤ ∣ x ( ) ∣ ≤ ⋅ ⋅ ⋅ ≤ ∣ x ( n ) ∣ . Also, for the special case α = ∣ x ( ) ∣ ∶= α = α j = jn , j ∈ { , , . . . , n − } , VaR α ( X ) , CVaR + α ( X ) , and λ areVaR α j ( X ) =∣ x ( j ) ∣ , CVaR + α j ( X ) = n − j n ∑ i = j + ∣ x ( i ) ∣ , and λ = α j − α j − α = , so that the CVaR can be expressed asCVaR α j ( X ) = n − j n ∑ i = j + ∣ x ( i ) ∣ , (5.6)which equals ⟪ x ⟫ Sα j by Equation 5.1.In the second case, i.e., α j < α < α j + , j ∈ { , , . . . , n − } , VaR α ( X ) , CVaR + α ( X ) , and λ areVaR α ( X ) =∣ x ( j + ) ∣ , CVaR + α ( X ) = n − ( j + ) n ∑ i = j + ∣ x ( i ) ∣ , and λ = α j + − α − α ,

33o that the CVaR can be expressed asCVaR α ( X ) = α j + − α − α ∣ x ( j + ) ∣ + ( − α j + − α − α ) n − ( j + ) n ∑ i = j + ∣ x ( i ) ∣ . (5.7)To show that Equation 5.7 equals Equation 5.2, Equation 5.2 needs to be manipulated, so that ⟪ x ⟫ Sα = µ ⟪ x ⟫ Sα j + ( − µ )⟪ x ⟫ Sα j + = µ n − j n ∑ i = j + ∣ x ( i ) ∣ + ( − µ ) n − ( j + ) n ∑ i = j + ∣ x ( i ) ∣= µ n − j ∣ x ( j + ) ∣ + µ n − j n ∑ i = j + ∣ x ( i ) ∣ + n − ( j + ) n ∑ i = j + ∣ x ( i ) ∣ − µ n − ( j + ) n ∑ i = j + ∣ x ( i ) ∣− α j + − α − α n − ( j + ) n ∑ i = j + ∣ x ( i ) ∣ + α j + − α − α n − ( j + ) n ∑ i = j + ∣ x ( i ) ∣= µ n − j ∣ x ( j + ) ∣ + ( − α j + − α − α ) n − ( j + ) n ∑ i = j + ∣ x ( i ) ∣+ ( µ n − j − µ n − ( j + ) + α j + − α − α n − ( j + ) ) n ∑ i = j + ∣ x ( i ) ∣= α j + − α − α ∣ x ( j + ) ∣ + ( − α j + − α − α ) n − ( j + ) n ∑ i = j + ∣ x ( i ) ∣ . (5.8)The last step follows because µ n − j = ( α j + − α ) ( − α j )( α j + − α j ) ( − α ) n − j = ( α j + − α ) ( − jn )( j + n − jn ) ( − α ) ( n − j ) = α j + − α − α , and µ n − j − µ n − ( j + ) + α j + − α − α n − ( j + ) = . Comparing Equation 5.8 and Equation 5.7 shows that CVaR α ( X ) = ⟪ x ⟫ Sα for α j < α < α j + , j ∈{ , , . . . , n − } .The last step is to show that CVaR α ( X ) = ⟪ x ⟫ Sα for n − n < α ≤

1, which is trivial, asCVaR α ( X ) = max i ∣ x i ∣ = ⟪ x ⟫ Sα in this case. This follows from Equation 5.3 and becauseCVaR α ( X ) = VaR α ( X ) , when VaR α ( X ) is the maximum loss possible [30, p. 1452], whichis the case for n − n < α ≤ α ( X ) , and hence must be equivalent.34 .2 Non-Scaled CVaR Norm The non-scaled CVaR norm (also called

CVaR norm ) is obtained by multiplying the scaledCVaR norm by a factor. This norm will have more signiﬁcance in the following chapters.

The non-scaled CVaR norm is obtained by multiplying the scaled CVaR norm by the factor n ( − α ) , i.e., ⟪ x ⟫ α ∶= n ( − α ) ⋅ ⟪ x ⟫ Sα . (5.9)The non-scaled CVaR norm will be called CVaR norm from here on for simplicity.Algorithms for calculating the scaled CVaR norm and CVaR norm will be implementedcomputationally and their eﬃciency will be compared in Section 5.4. Since the algorithms willbe based on the deﬁnitions of the norms, it is computationally more eﬃcient to calculate theCVaR norm from an algorithm based on Deﬁnition 5.2 than based on Equation 5.9 as thiseliminates two calculation steps: ﬁrst scaling by n − j and then multiplying by n ( − α ) . Hence,the following deﬁnition of the CVaR norm will be used. Deﬁnition 5.2 ([25, p. 14f.] Component-wise CVaR Norm) . Let the absolute values of thecomponents of vector x ∈ R n be ordered in ascending order, i.e. ∣ x ( ) ∣ ≤ ∣ x ( ) ∣ ≤ . . . ≤ ∣ x ( n ) ∣ .For α j = jn , j = , . . . , n − , the CVaR norm ⟪ x ⟫ α of vector x with parameter α j is deﬁned as ⟪ x ⟫ α ∶= n ∑ i = j + ∣ x ( i ) ∣ . (5.10) For α such that α j < α < α j + , j = , . . . , n − , the CVaR norm ⟪ x ⟫ α equals the weighted averageof ⟪ x ⟫ α j and ⟪ x ⟫ α j + , i.e. ⟪ x ⟫ α ∶= λ ⟪ x ⟫ α j + ( − λ )⟪ x ⟫ α j + , (5.11) where λ = α j + − αα j + − α j . And ﬁnally, for α such that n − n < α < , ⟪ x ⟫ α ∶= n ( − α )⟪ x ⟫ α n − = n ( − α ) max i ∣ x i ∣ . (5.12)Again, some examples will be given to gain a better familiarity with the CVaR norm. Theexamples are the same as in Subsection 5.1.1. For x = [ , − , , − ] T , ⟪ x ⟫ = ∣ ∣ + ∣ − ∣ + ∣ ∣ + ∣ − ∣ = , ⟪ x ⟫ . = ∣ − ∣ + ∣ ∣ + ∣ − ∣ = , ⟪ x ⟫ . = ∣ ∣ + ∣ − ∣ = , and ⟪ x ⟫ . = ∣ − ∣ = . In contrast to ⟪ x ⟫ Sα , ⟪ x ⟫ α /= ⟪ x ⟫ . for α > .

75, as, for example, ⟪ x ⟫ . = ( − . ) ⋅ = . ⟪ x ⟫ , λ must be calculated ﬁrst to use Equation 5.11. Since 0 . < λ < . λ = − − = . ⟪ x ⟫ = λ ⟪ x ⟫ . + ( − λ ) ⟪ x ⟫ . = +

24, so that ⟪ x ⟫ = x ∈ R , the unit balls of ⟪ x ⟫ α for α ∈ { , . , . , . , . } are shown below in Figure 5.2.Figure 5.2: Unit balls of ⟪ x ⟫ α for x ∈ R and diﬀerent values of α . Alternatively, the CVaR norm can be obtained by solving the following minimization (usingEquation 5.9 and Proposition 5.1).

Proposition 5.2 ([25, p. 16] CVaR Norm based on CVaR Deﬁnition) . For ≤ α < , ⟪ x ⟫ α = min c ( n ( − α ) c + n ∑ i = (∣ x i ∣ − c ) + ) . (5.13)Writing Proposition 5.2 as an LP, i.e., ⟪ x ⟫ α = min c n ( − α ) c + n ∑ i = z i s.t. z i ≥ ∣ x i ∣ − c for i ∈ { , . . . , n } z i ≥ i ∈ { , . . . , n } ⎫⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎭ , (5.14)one can use the strong duality theory of LP to obtain an equivalent deﬁnition of the CVaRnorm [17, p. 5]. This alternative deﬁnition can be expressed asmax n ∑ i = ∣ x i ∣ q i s.t. n ∑ i = q i = n ( − α ) for i ∈ { , . . . , n } ≤ q i ≤ i ∈ { , . . . , n } ⎫⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎭ , (5.15)which is the continuous knapsack problem .The knapsack problem is a standard integer programming problem. Suppose that there is adecision to make on whether to use any of n items, each of which has a beneﬁt b i and a cost c i for i ∈ { , , . . . , n } . The goal is to maximize total beneﬁt with a constraint on the total costs, C . The only additional constraint of the knapsack problem is that the decision variables q i mustbe 0 or 1, i.e., an item is used completely or not at all - which makes it an integer programming36roblem [32, p. 524]. Hence, the knapsack problem can be formulated asmax q n ∑ i = b i q i s.t. n ∑ i = c i q i ≤ Cq i ∈ { , } for i ∈ { , . . . , n } ⎫⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎭ . (5.16)Changing the integer constraint ( q i ∈ { , } ) to a linear constraint (0 ≤ q i ≤

1) and changing theinequality of the ﬁrst constraint to an equality transforms the knapsack problem into the con-tinuous knapsack problem, which is a linear programming problem. In the continuous knapsackproblem it is possible to use fractions of any item, making it easier and more straightforward tosolve (see Proposition 5.3). The parameters between Problem 5.16 and Problem 5.15 are linkedin such a way that b i = ∣ x i ∣ , c i = i ∈ { , . . . , n } , and C = n ( − α ) .The optimal objective value of Problem 5.15 is another equivalent deﬁnition of the CVaRnorm (since strong duality holds). The optimal objective value of Problem 5.15 can be foundby a greedy algorithm, the result of which is stated below. Proposition 5.3 ([17, p. 6] CVaR Norm based on dual formulation of CVaR deﬁnition) . Letthe absolute values of the components of vector x ∈ R n be ordered in descending order, i.e. ∣ x ( ) ∣ ≥ ∣ x ( ) ∣ ≥ . . . ≥ ∣ x ( n ) ∣ . Then ⟪ x ⟫ α = ⌊ n ( − α )⌋ ∑ i = ∣ x ( i ) ∣ + ( n ( − α ) − ⌊ n ( − α )⌋) ∣ x (⌊ n ( − α )⌋+ ) ∣ . (5.17)In Proposition 5.3, the absolute values of the components of x are ordered in descendingorder, which contrasts the original deﬁnition of the CVaR norm in Deﬁnition 5.2. This is doneso that the equivalence between Equation 5.17 and the D-norm given in Deﬁnition 5.3 willbecome apparent (see Subsection 5.3.2). Any function ρ ∶ R n → R satisﬁes the following properties is a norm on R n [26, p. 20]:i) ρ ( x ) ≥ ∀ x ∈ R n ii) ρ ( λ x ) = ∣ λ ∣ ρ ( x ) , ∀ x ∈ R n , ∀ λ ∈ R iii) ρ ( x + y ) ≤ ρ ( x ) + ρ ( y ) , ∀ x , y ∈ R n iv) ρ ( x ) = ⇒ x = norms . Pavlikov and Uryasev showed that the scaled CVaR norm C Sα is a non-decreasing function ofthe parameter α . Proposition 5.4 ([25, p. 7]) . For a vector x ∈ R n and ≤ α ≤ α ≤ , ⟪ x ⟫ Sα ≤ ⟪ x ⟫ Sα . The greedy algorithm (stated in Proposition 5.3) can be interpreted as follows: The knapsack has a limit of n ( − α ) and each vector component ∣ x i ∣ has the same weight. Pack as much of ∣ x ( ) ∣ (the component with highestmagnitude) into the knapsack. If the component completely ﬁts into the knapsack (i.e. q i = q i are allowed. α within each interval [ α j , α j + ] . Proposition 5.5.

For any vector x ∈ R n , and α ∈ [ jn , j + n ] , j = , , . . . n − the scaled CVaRnorm ⟪ x ⟫ Sα is convex in α , i.e., ⟪ x ⟫ Sλα +( − λ ) α ≤ λ ⟪ x ⟫ Sα + ( − λ )⟪ x ⟫ Sα for all α , α ∈ [ jn , j + n ] , j = , , . . . , n − and λ ∈ [ , ] .Proof. For α ∈ ( n − n , ] the proof of Proposition 5.5 is obvious, as ⟪ x ⟫ Sα is constant for thesevalues of α .To show that ⟪ x ⟫ Sα is piecewise convex in α within each interval [ jn , j + n ] , j = , , . . . n − α , α ∈ [ α j , α j + ] , t = λα +( − λ ) α , λ ∈ [ , ] , and α , α , α j and α j + are labelled a, b, c, d in such a way that 0 ≤ a = α j ≤ b ≤ t ≤ c ≤ d = α j + ≤ n − n . Then ⟪ x ⟫ Sλα +( − λ ) α = ⟪ x ⟫ St , ⟪ x ⟫ Sα and ⟪ x ⟫ Sα can be written as ⟪ x ⟫ St = µ ⟪ x ⟫ Sa + ( − µ ) ⟪ x ⟫ Sd with µ = ( d − t )( − a )( d − a )( − t ) , (5.18) ⟪ x ⟫ Sα = µ ⟪ x ⟫ Sa + ( − µ ) ⟪ x ⟫ Sd with µ = ( d − b )( − a )( d − a )( − b ) , and (5.19) ⟪ x ⟫ Sα = µ ⟪ x ⟫ Sa + ( − µ ) ⟪ x ⟫ Sd with µ = ( d − c )( − a )( d − a )( − c ) . (5.20)Hence, it needs to be shown that ⟪ x ⟫ St ≤ λ ⟪ x ⟫ Sα + ( − λ )⟪ x ⟫ Sα , i.e. µ ⟪ x ⟫ Sa + ( − µ ) ⟪ x ⟫ Sd ≤ λ [ µ ⟪ x ⟫ Sa + ( − µ ) ⟪ x ⟫ Sd ]+ ( − λ ) [ µ ⟪ x ⟫ Sa + ( − µ ) ⟪ x ⟫ Sd ] . Rearranging ⟪ x ⟫ Sa and ⟪ x ⟫ Sd leaves to prove that0 ≤ ( λµ + ( − λ ) µ − µ ) ⟪ x ⟫ Sa + ( λ ( − µ ) + ( − λ )( − µ ) − ( − µ )) ⟪ x ⟫ Sd ⇐⇒ ≤ ( µ + λµ − λµ − µ ) ⟪ x ⟫ Sa + ( µ + λµ − λµ − µ ) ⟪ x ⟫ Sd ⇐⇒ ≤ ( µ + λµ − λµ − µ ) (⟪ x ⟫ Sd − ⟪ x ⟫ Sa ) . By Proposition 5.4, since d ≥ a ⇒ ⟪ x ⟫ Sd − ⟪ x ⟫ Sa ≥

0. Hence, to complete the proof, it mustbe shown that µ + λµ − λµ − µ ≥ ≤ a = α j ≤ b ≤ t ≤ c ≤ d = α j + ≤ n − n and λ ∈ [ , ] .Using expressions 5.18, 5.19 and 5.20 and eliminating the common − ad − a term yields:0 ≤ µ + λµ − λµ − µ = d − t − t + λ d − c − c − λ d − b − b − d − c − c ⇔ ≤ ( d − t )( − b )( − c ) + λ ( d − c )( − b )( − t )− λ ( d − b )( − c )( − t ) − ( d − c )( − b )( − t ) . (5.21)38ubstituting t = λb + ( − λ ) c into Equation 5.21, expanding all brackets and summarizingthe terms gives 0 ≤ λ ( b − b d + c − c d + bcd − bc )+ λ ( b d − b + c d − c + bc − bcd ) , which simpliﬁes to 0 ≤ λ ( − λ ) ( − d ) ( c − b ) . (5.22)Equation 5.22 holds for all 0 ≤ a = α j ≤ b ≤ t ≤ c ≤ d = α j + ≤ n − n and λ ∈ [ , ] , which completesthe proof.To illustrate Proposition 5.5, ⟪ x ⟫ Sα is drawn against α for four diﬀerent x in Figure 5.3.Depending on the components of x , the convexity is more or less pronounced in the graphs.Figure 5.3: Scaled CVaR norm C Sα against α for diﬀerent x .To show that ⟪ x ⟫ Sα is not convex over the whole interval [ , ] consider x = [− , , − ] ,whose scaled CVaR norm is shown in the top left graph of Figure 5.3. Taking α = . α = . λ = gives α t = λα + ( − λ ) α = and ⟪ x ⟫ S . = = . , ⟪ x ⟫ S . = ≈ . , and ⟪ x ⟫ S = = . . Hence, ⟪ x ⟫ Sα t = ⟪ x ⟫ S = . /≤ λ ⟪ x ⟫ S . + ( − λ )⟪ x ⟫ S . =

13 334 +

23 889 ≈ .

27. Therefore, ⟪ x ⟫ Sα is onlypiecewise convex, but not over the whole interval [ , ] . This is also apparent from the plotsthemselves. While the scaled CVaR norm is a non-decreasing function of the parameter α (see Proposi-tion 5.4), the CVaR norm shows diﬀerent properties:39 roposition 5.6 ([25, p. 15]) . For x ∈ R n , the CVaR norm ⟪ x ⟫ α is a non-increasing, concave,piecewise-linear function of the parameter α . Furthermore, the CVaR norm C α coincides with the D-norm, which is deﬁned below. Deﬁnition 5.3 ([7, p. 513] D-Norm) . For x ∈ R n and parameter κ ∈ [ , n ] , the D-norm ∣∣∣ x ∣∣∣ κ is deﬁned as ∣∣∣ x ∣∣∣ κ ∶= max S,t (∑ i ∈ S ∣ x i ∣ + ( κ − ⌊ κ ⌋)∣ x t ∣) , where N = { , . . . , n } , S ⊆ N , ∣ S ∣ ≤ ⌊ κ ⌋ , and t ∈ S ∖ N . The D-norm is used in robust optimization as an alternative to the L norm for describingan uncertainty set using a norm. The D-norm has advantages such as the guarantee of feasibilityindependent of uncertainty distributions and a ﬂexibility in trade oﬀ between robustness andperformance [35, p. 40]. A further discussion of the D-norm (beyond the coincidence with the C α norm) or robust optimization in general is beyond the scope of this thesis. Further discussionson the D-norm are given in [7] and [35], while robust optimization is discussed in [14, p. 292ﬀ.]or [6]. Proposition 5.7 ([25, p. 16]) . For x ∈ R n , the CVaR norm ⟪ x ⟫ α with parameter α ∈ [ , n − n ] coincides with the D-norm ∣∣∣ x ∣∣∣ κ with parameter κ = n ( − α ) , i.e. ⟪ x ⟫ α = ∣∣∣ x ∣∣∣ κ . This is because the D-norm is an equivalent formulation to the CVaR norm given in Propo-sition 5.3. Note that Proposition 5.7 does not hold for n − n < α ≤

1, as for n − n < α ≤ ⇒ κ = n ( − α ) < ⇒ κ /∈ [ , n ] , so that the D-norm is not deﬁned in this case [25, p. 16].Comparisons to L p norms are made more extensively in Chapter 6. This section investigates how computationally eﬃcient diﬀerent algorithms are for calculating ⟪ x ⟫ Sα and ⟪ x ⟫ α . The deﬁnitions of ⟪ x ⟫ Sα and ⟪ x ⟫ α in Deﬁnition 5.1 and Deﬁnition 5.2, respec-tively, naturally lead to simple algorithms for computing the norms. The algorithms that wereimplemented in MATLAB are printed in Appendix A.2 for ⟪ x ⟫ Sα and Appendix A.4 for ⟪ x ⟫ α .Informally, they can be described as follows:1. Take the absolute values of the entries of x ∈ R n and order them in ascending order.2. If α > n − n , use Equation 5.3 or Equation 5.12 to calculate C Sα or C α , respectively.3. If α = α j , i.e., α = jn for any j = , , . . . , n −

1, use Equation 5.1 or Equation 5.10 tocalculate C Sα or C α , respectively.4. Otherwise, ﬁnd the closest α j and α j + , such that α j < α < α j + , calculate µ (for C Sα ) or λ (for C Sα ), and use Equation 5.2 or Equation 5.11 to calculate C Sα or C α , respectively.To calculate ⟪ x ⟫ Sα and ⟪ x ⟫ α using Proposition 5.1 or Proposition 5.2, respectively, the ac-cording optimization problem was written in MATLAB CVX ([18],[19], for the code see Ap-pendix A.3 and Appendix A.5). The algorithm that was used to solve the optimization problemwas picked automatically by CVX with no further input by the author. When referring an“optimization algorithm” in the remainder of this section, the codes given in Proposition 5.1 orProposition 5.2 are meant.To compare the computational eﬃciencies of the diﬀerent algorithms, random vectors ofdimensions n ∈ { , , , , , , } were generated, and each of the algorithms given inAppendix A.2 - Appendix A.5 was run 10 times to calculate C Sα or C α , respectively. Theaverage time taken over the 10 runs is the computation time stated in Table 5.1, Table 5.2, andAppendix B.6. These calculations were performed for values of α ∈ { , . , . , . , . , . } This is only a selection of available literature on these topics. Computation time in msComponent-wise Optimization α n ⟪ x ⟫ Sα (Deﬁnition 5.1) ⟪ x ⟫ α (Deﬁnition 5.2) ⟪ x ⟫ Sα (Proposition 5.1) ⟪ x ⟫ α (Proposition 5.2)0.5 2 0.13 0.08 178.59 174.963 0.18 0.12 180.96 179.3410 0.13 0.08 184.33 181.49100 0.15 0.10 217.66 213.111,000 0.19 0.14 323.36 239.7210,000 1.00 0.92 571.45 551.93100,000 5.64 5.00 5516.37 5128.19 Table 5.1: Computation times of ⟪ x ⟫ Sα and ⟪ x ⟫ α at α = . x ∈ R n for diﬀerent n inmilliseconds. Computation time in msComponent-wise Optimization n α ⟪ x ⟫ Sα (Deﬁnition 5.1) ⟪ x ⟫ α (Deﬁnition 5.2) ⟪ x ⟫ Sα (Proposition 5.1) ⟪ x ⟫ α (Proposition 5.2)1,000 0.0 0.19 0.14 202.81 199.380.1 0.19 0.14 244.86 236.010.25 0.19 0.14 229.73 271.940.5 0.19 0.14 323.36 239.720.7 0.19 0.15 252.11 241.460.9 0.19 0.14 289.31 249.22 Table 5.2: Computation times of ⟪ x ⟫ Sα and ⟪ x ⟫ α at diﬀerent α of a vector x ∈ R n for n = n ≤ ,

000 the computing times for ⟪ x ⟫ Sα and ⟪ x ⟫ α using thecomponent-wise algorithms do not increase signiﬁcantly with increasing n . For n ≥ ,

000 thereis a notable increase in computing time with increasing n , for both algorithms and both norms.Table 5.2 shows that the value of α does not have any considerable eﬀect on the comput-ing time for the component-wise algorithm, whereas the computing times for the optimizationalgorithm ﬂuctuate with α .Both tables clearly show that the component-wise algorithms (given in Appendix A.2 andAppendix A.4) outperform the optimization algorithms by several orders of magnitude. Hence,in the rest of this thesis only the component-wise algorithms will be used when comparing com-putational eﬃciencies against other norms. However, the component-wise algorithms cannot beused to solve any optimization problem involving the calculation of a CVaR norm as constraintscannot be included. Hence, the optimization algorithms to calculate C Sα and C α are the onlychoice when trying to solve optimization problems, e.g. model recovery problems discussed inChapter 7. All calculations are performed on a PC with an Intel Core iS-2400S with 4 cores @ 2.5 GHz and 4 GB ofmemory. hapter 6 Comparisons to L p Vector Norms

This chapter explores how the scaled CVaR norm C Sα and CVaR norm C α compare to several L p norms for diﬀerent values of α and p , as investigated by [17] and [25]. First, in Section 6.1a brief overview of the behaviour of C Sα will be given following the examples of [25]. Then, thefocus will shift to the C α norm: Section 6.2 illustrates how α and p can be chosen so that C α best approximates L p . To conclude this chapter, Section 6.3 extends the numerical examples for C α given in [25] by the ﬁndings of Section 6.2. C Sα To describe the behaviour of the scaled CVaR norm, Pavlikov and Uryasev use two examples[25, p. 4 ﬀ.]. For each comparison, the scaled L Sp norm is used, which is deﬁned by ∣∣ x ∣∣ Sp = ( n n ∑ i = ∣ x i ∣ p ) p , (6.1)where p ≥

1. The actual examples used for the comparison are:1. Let x = ( , , , , − ) T , calculate ⟪ x ⟫ Sα for α ∈ [ , ] and corresponding ∣∣ x ∣∣ Sp for p = ( − α ) . This is shown in Figure 6.1.2. Compare the unit disks for C Sα and L Sp , i.e. the sets U Sα = { x = ( x , x ) ∣ ⟪ x ⟫ Sα ≤ } and U Sp = { x = ( x , x ) ∣ ∣∣ x ∣∣ Sp ≤ } for α ∈ { , . , − √ , . , } and corresponding p ( α ) = ( − α ) .This comparison is shown in Figure 6.2.Figure 6.1: Reproduced from [25, p. 6], C Sα and L Sp Norms of x for diﬀerent values of α and p ( α ) . 42igure 6.2: [25, p. 5] Norm unit disks of C Sα and L Sp for diﬀerent values of α and p ( α ) .As can be seen in Figure 6.2, ⟪ x ⟫ S = ∣∣ x ∣∣ S and ⟪ x ⟫ Sα = ∣∣ x ∣∣ S ∞ for α ∈ [ n − n , ] . This relationshipfollows from Deﬁnition 5.1 and Equation 6.1. α and p for C α and L p In [17], Gotoh and Uryasev explored (among other things) the question: “For what value of κ ∈ [ , n ] does the CVaR norm (or its dual ) give the best approximation of the L p -norm, and This thesis will not introduce or explain the dual CVaR norm , but focus on the ﬁndings of [17] regarding theCVaR norm (which was deﬁned in Section 5.2).

43n which sense is it the best” [17, p. 3]? Gotoh’s and Uryasev’s analysis consisted of ﬁnding tight bounds on the ration ⟪ x ⟫ α ∣∣ x ∣∣ p - alower bound L and an upper bound U , such that L ≤ ⟪ x ⟫ α ∣∣ x ∣∣ p ≤ U . Then they deﬁned the ratio U / L as a measure of proximity (i.e. the goodness of approximation of ∣∣ x ∣∣ p by ⟪ x ⟫ α ). Finally,they deﬁned a quasi-convex function f n,p ( κ ) = U / L and analysed for with value of α ( p ) f n,p ( κ ) attains its minimum. This α ∗ then gives ⟪ x ⟫ α ∗ , which is is the best approximation of ∣∣ x ∣∣ p . Proposition 6.1 ([17, p. 6]) . For any p ∈ ( , ∞) , α ∈ [ , n − n ] , and x ∈ R n ∖ { } , it is valid min { , n − p ( − α )} ≤ ⟪ x ⟫ α ∣∣ x ∣∣ p ≤ (⌊ κ ⌋ + ( κ − ⌊ κ ⌋) pp − ) p − p , (6.2) where κ = n ( − α ) . The proof of Proposition 6.1 is given in Chapter A.1 of [17].Based on Equation 6.2, the ratio U / L , where U = (⌊ κ ⌋ + ( κ − ⌊ κ ⌋) pp − ) p − p and L = min { , n − p ( − α )} deﬁnes a function, which evaluates the proximity of ⟪ x ⟫ α to ∣∣ x ∣∣ p : f n,p ( κ ) ∶= (⌊ κ ⌋ + ( κ − ⌊ κ ⌋) pp − ) p − p min { , n − p ( − α )} . (6.3) Lemma 6.1 ([17, p. 9]) . The function f n,p ( κ ) is continuous at any κ ∈ ( , n ) , and diﬀerentiableat any non-integer except κ = n p , i.e. κ /∈ { , . . . , n } ∪ { n p } . Proposition 6.2 ([17, p. 9]) . The function f n,p ( κ ) is decreasing for κ ≤ n p . The function f n,p ( κ ) is increasing for κ ≥ n p . Accordingly, f n,p ( κ ) uniquely attains its minimum value, (⌊ κ ⌋ + ( κ − ⌊ κ ⌋) pp − ) p − p , at κ = n p . The proofs of Lemma 6.1 and Proposition 6.2 are given in sections A.3 and A.4 of [17],respectively.Using Proposition 6.2 and substituting κ = n ( − α ) gives the values of α and p for which ⟪ x ⟫ α best approximates ∣∣ x ∣∣ p [17, p. 9] as α ∗ = − n p − , and (6.4) p ∗ = ln ( n ) ln ( n ( − α )) . (6.5)Gotoh and Uryasev also compared the proximity ratio U / L = f n,p ( κ ) given by Equation 6.3for diﬀerent combinations of p and n , each with optimal κ ∗ = n ( − α ∗ ) = n p (see Figure 6.3).The ratio f n,p ( κ ∗ ) becomes largest at p =

2, which indicates that L is the hardest L p norm toapproximate by the CVaR norm [17, p. 11]. Here, k refers is the parameter used in Deﬁnition 5.3 of the D-norm, which is related to α as κ = n ( − α ) (seeProposition 5.7). The term tight means that there is some x which satisﬁes the equality. f n,p ( κ ∗ ) for diﬀerent values of n and p , with κ ∗ = n p . C α To see how C α behaves for diﬀerent values of α , Pavlikov and Uryasev used the same examplesas in the previous subsection, but compared C α to standard L p norms ∣∣ x ∣∣ p = ( n ∑ i = ∣ x i ∣ p ) p , (6.6)where p ≥

1. Hence, using the same numerical examples the comparisons are1. Let x = ( , , , , − ) T , calculate ⟪ x ⟫ α for α ∈ [ , ] and corresponding ∣∣ x ∣∣ p and ∣∣ x ∣∣ p ∗ ,with p = ( − α ) and optimal p ∗ = ln ( n ) ln ( n ( − α )) . This is shown in Figure 6.4.2. Compare the unit disks for C α and L p , i.e. the sets U α = { x = ( x , x ) ∣ ⟪ x ⟫ α ≤ } and U p = { x = ( x , x ) ∣ ∣∣ x ∣∣ p ≤ } for α ∈ { , . , − √ , . , . } and corresponding p ( α ) = ( − α ) . This comparison is shown in Figure 6.5.Figure 6.4: Reproduced from [17, p. 10], C α and L p Norms of x for diﬀerent values of α and p ( α ) . Here, optimal means that for p = p ∗ , ∣∣ x ∣∣ p best approximates ⟪ x ⟫ α C α and L p for diﬀerent values of α and p ( α ) .Again, there is a close relationship between C α and L / L ∞ . As is depicted in Figure 6.5and as can be shown from Equation 5.10 and Equation 6.6, ⟪ x ⟫ = ∣∣ x ∣∣ and ⟪ x ⟫ n − n = ∣∣ x ∣∣ ∞ .Letting x ∈ R ∶ ∣ x ∣ , ∣ x ∣ ≤

10 and producing surface plots of ⟪ x ⟫ α ∗ and ∣∣ x ∣∣ p for p = α ∗ = −√ gives the plots shown in Figure 6.6. Additional surface plots for varying values of α and p ∗ are displayed in Appendix C.3. 46igure 6.6: Norm surface plots ( C α and L p ) of x for p = α ∗ = −√ .Comparing the projections of a circle C = { x ∈ R ∶ x + x = , x = } onto the unit ball U = { x ∈ R ∶ x T x = } using the L norm and C α ∗ norm, with α ∗ = − √ is shown in Figure 6.7.Further comparisons for diﬀerent α are shown in Appendix C.4.Figure 6.7: Projection of a circle onto the unit ball in x ∈ R using L and C α ∗ norm, with α ∗ = − √ . 47 hapter 7 Model Recovery Using AtomicNorms

Many real world problems require solving an ill-posed inverse problem, in which the number ofmeasurements is smaller than the dimension of the model to be estimated. But if the structureof the model is favourable, the original model can be recovered by the use of atomic norms, tobe more precise, by minimizing the atomic norm, i.e. solving the problem [11, p. 811]ˆ x = arg min x ∣∣ x ∣∣ A s.t. y = Φx } , (7.1)where ∣∣ ⋅ ∣∣ A is the atomic norm. The candidate vector x ∗ can be formed from a set of atoms A , i.e. x ∗ = ∑ ki = c i a i where a i ∈ A , c i ≥ Φ ∶ R p → R n is available. Also, the measurement y = Φx ∗ is known. The goal is to reconstruct x ∗ given y .The following sections will discuss how atomic norms can be derived from a set of atoms andwhich conditions need to be satisﬁed to allow for recovery. A model can be considered simple if it can be expressed as a non-negative combination of atoms(i.e. basic building blocks of the model). More precisely, let x ∈ R p be formed as [11, p. 806] x = k ∑ i = c i a i , (7.2)for a i ∈ A , c i ≥

0, where A is the set of atoms.The atomic norm of a set of atoms A is then derived by forming the convex hull of A , i.econv (A) . Figure 7.1 displays the relation between diﬀerent sets of atoms and their correspondingatomic norms in R . 48igure 7.1: Atoms, their convex hull, and relation to the L and C α norms in R .Choosing the atoms as the unit vectors of R and forming the convex hull gives the unitball of the L norm. Hence, for A L = {± e i } i = , the atomic norm is the L norm (see left sideof Figure 7.1). If we extend then set of atoms to also include the points ( − α ) [± , ± ] T , for0 < α < , i.e. A = {± e i } i = ∪ ( − α ) [± , ± ] T , < α < , then the atomic norm of A is the C α norm in R , with 0 < α < (see right side of Figure 7.1 and Conjecture 8.1).A formal relation between conv (A) and the atomic norm induced by A can be derived fromdiﬀerent results of convex analysis: Deﬁnition 7.1 ([20, p. 128] Gauge of a set) . Let A be a closed convex set containing the origin.The function deﬁned by γ A ( x ) ∶= inf { λ > ∶ x ∈ λ conv (A)} (7.3) is called the gauge of A . If /∃ λ ∶ x ∈ λ conv (A) , then γ A ( x ) = +∞ . Proposition 7.1 ([9, p. 10]) . Assume that the centroid of conv (A) is at the origin, which canbe achieved by appropriate recentering. Then the gauge function can be rewritten as γ A ( x ) = inf { ∑ a ∈A c a ∶ x = ∑ a ∈A c a a , c a ≥ ∀ a ∈ A} . (7.4)Furthermore, if A is centrally symmetric about the origin (i.e. a ∈ A if and only if − a ∈ A ),then the gauge γ A is a norm, which is called the atomic norm induced by A [11, p. 810]. Inthis case, it will be denoted by ∣∣ ⋅ ∣∣ A . The support function of A is given below. Deﬁnition 7.2 ([20, p. 134], [11, p. 810] Support Function) . Let A be a non-empty set in R n .The function deﬁned by ∣∣ x ∣∣ ∗A ∶= sup {⟨ x , a ⟩ ∶ a ∈ A} (7.5) is called the support function of A . ⟨ x , a ⟩ denotes the dot-product x T a . ∣∣ ⋅ ∣∣ A is a norm, the support function ∣∣ ⋅ ∣∣ ∗A is the dual norm of the atomic norm. Thisdeﬁnition shows that the unit ball of ∣∣ ⋅ ∣∣ A is equal to conv (A) [11, p. 810].In addition to the above concepts, some background on cones is also necessary for the fol-lowing sections: Deﬁnition 7.3 ([20, p. 21] Convex Cone) . The set K is a cone if ∀ t > , k ∈ K ⇒ t k ∈ K .Furthermore, the cone is convex if the set K is convex. Deﬁnition 7.4 ([11, p. 814] Polar Cone) . The polar K ∗ of a cone K is the cone K ∗ ∶= { x ∈ R p ∶ ⟨ x , k ⟩ ≤ ∀ k ∈ K } . (7.6)To provide a better understand of cones and polar cones, examples (taken from [1, p. 35])are shown in Figure 7.2.Figure 7.2: [1, p. 35] Examples of cones K and polar cones K ∗ . Deﬁnition 7.5 ([11, p. 814] Tangent Cone) . For some non-zero x ∈ R p , the tangent cone at x with respect to the scaled unit ball ∣∣ x ∣∣ A conv (A) is T A ( x ) ∶= cone { z − x ∶ ∣∣ z ∣∣ A ≤ ∣∣ x ∣∣ A } . (7.7) Deﬁnition 7.6 ([11, p. 814] Normal Cone) . The normal cone N A ( x ) at x with respect tothe scaled unit ball ∣∣ x ∣∣ A conv (A) is the set of all directions that form obtuse angles with everydescent direction of the atomic norm ∣∣ ⋅ ∣∣ A at the point x , i.e. N A ( x ) ∶= { s ∶ ⟨ s , z − x ⟩ ≤ ∀ z s.t. ∣∣ z ∣∣ A ≤ ∣∣ x ∣∣ A } . (7.8)Examples of tangent and normal cones for a general convex set C (again taken from [1, p.49]) are shown in Figure 7.3 to provide a better understanding of these concepts.50igure 7.3: [1, p. 49] Examples of tangent and normal cones with respect to a set C .The tangent cone is equal to the set of descent directions of the atomic norm ∣∣ ⋅ ∣∣ A at point x , i.e. the set of all directions d such that the directional derivative is negative [11, p. 814].The normal cone is equal to the set of all normals of hyperplanes given by normal vectors s that support the scaled unit ball ∣∣ x ∣∣ A conv (A) at x . Additionally, the tangent cone T A ( x ) andnormal cone N A ( x ) are polar cones of each other. And ﬁnally, the normal cone N A ( x ) is theconic hull of the subdiﬀerential of the atomic norm at x [11, p. 814]. This section states the conditions that are necessary to recover a vector ˆ x exactly (when themeasurements y ∈ R n are noise free) or robustly (when the measurements are noisy). The con-cepts presented in Section 7.1 are used to derive the number of measurements n needed to ensureexact (or robust) recovery.Recall Problem 7.1, which states ˆ x = arg min x ∣∣ x ∣∣ A s.t. y = Φx . The dual problem of 7.1 is [11, p. 811]max z y T z s.t. ∣∣ Φ T z ∣∣ ≤ ⎫⎪⎪⎬⎪⎪⎭ . (7.9)Now suppose that the measurements y are noisy, i.e. y is formed as y = Φx ∗ + ω , where ω is the noise term. If an upper bound on the noise term is known, i.e. ∣∣ ω ∣∣ ≤ δ , the constraint inProblem 7.1 can be relaxed to give [11, p. 811]ˆ x = arg min x ∣∣ x ∣∣ A s.t. ∣∣ y − Φx ∣∣ ≤ δ } . (7.10)In the noise free case, the solution to Problem 7.1 (ˆ x ) is considered an exact recovery so thatˆ x = x ∗ . If the error ∣∣ ˆ x − x ∗ ∣∣ is small in Problem 7.10 then the recovery is considered robust . Theconditions for exact and robust recovery will be given below.Let Ker ( Φ ) denote the kernel or nullspace of the linear mapping Φ . Then the exact recoverycondition is stated in Proposition 7.2 below. 51 roposition 7.2 ([11, p. 815] Exact Recovery Condition) . ˆ x = x ∗ is the unique optimal solutionof Problem 7.1 if and only if Ker ( Φ ) ∩ T A ( x ∗ ) = { } . Given that the measurements of y are noisy, it is possible to give a condition for when x ∗ can be well approximated. Proposition 7.3 ([11, p. 815] Proximity of Robust Recovery) . Suppose that there are n noisymeasurements y = Φx ∗ + ω where ∣∣ ω ∣∣ ≤ δ and Φ ∶ R p → R n . Let ˆ x denote an optimal solution ofProblem 7.10. Further suppose that ∣∣ Φz ∣∣ ≥ (cid:15) ∣∣ z ∣∣ holds for all z ∈ T A ( x ∗ ) . Then ∣∣ ˆ x − x ∗ ∣∣ ≤ δ(cid:15) . The proofs of Proposition 7.2 and Proposition 7.3 are given in [11, p. 815]. Hence the smallerthe tangent cone at x ∗ with respect to conv (A) , the easier it is to satisfy the empty intersectioncondition of Proposition 7.2 and to recover ˆ x [11, p. 816].By Proposition 7.2, Ker ( Φ ) must miss T A ( x ∗ ) for an exact recovery. Gordon ([16]) derived anexpression for the probability that a uniformly distributed subspace of ﬁxed dimension missesa cone and his ﬁndings form the basis of the analysis of Chandrasekaran et. al ([11]). Animportant part in the analysis is the Gaussian width of a set. Deﬁnition 7.7 ([11, p. 817] Gaussian Width) . The

Gaussian width of a set S ∈ R p is deﬁnedas w ( S ) ∶= E g [ sup z ∈ S g T z ] , (7.11) where g ∼ N ( , I ) is a vector of independent zero-mean unit-variance Gaussians. Gordon deﬁned the likelihood that a random subspace misses a cone K purely in terms ofthe dimension of the subspace and the Gaussian width w ( K ∩ S p − ) , where S p − ⊂ R p is the unitsphere [11, p. 817]. To introduce the following results, the expected length of a k -dimensionalGaussian random vector (denoted λ k ) is needed. By integration and induction, it can be shownthat λ k is tightly bounded as k √ k + ≤ λ k ≤ √ k . With this notation, a bound on these quantitiescan be given. Theorem 7.1 ([16, p. 86]) . Let Ω be a closed subset of S p − and let Φ ∶ R p → R n be a randommap with i.i.d. zero-mean Gaussian entries having variance one. Then E [ min z ∈ Ω ∣∣ Φz ∣∣ ] ≥ λ k − w ( Ω ) . (7.12)Theorem 7.1 then leads to the required number of measurements to give an exact or robustrecovery with a given probability. Speciﬁcally, if the measurement map Φ ∶ R p → R n consists ofi.i.d. zero-mean Gaussian entries having variance 1 / n , then the required number of measurementsis given in Corollary 7.1, the proof of which is given in [11, p. 818f.]. Corollary 7.1 ([11, p. 818]) . Let Φ ∶ R p → R n be a random map with i.i.d. zero-mean Gaussianentries having variance / n . Further let Ω = T A ( x ∗ ) ∩ S p − denote the spherical part of thetangent cone T A ( x ∗ ) .1. Suppose that there are measurements y = Φx ∗ to solve Problem 7.1. Then x ∗ is the uniqueoptimum of Problem 7.1 with probability at least − exp (− [ λ n − w ( Ω )] ) provided n ≥ w ( Ω ) + . (7.13)

2. Suppose that there are noisy measurements y = Φx ∗ + ω , with the noise bounded as ∣∣ ω ∣∣ ≤ δ to solve Problem 7.10. Letting ˆ x denote the optimal solution of Problem 7.10, then ∣∣ x ∗ − ˆ x ∣∣ ≤ δ(cid:15) with probability at least − exp (− [ λ n − w ( Ω ) − √ n(cid:15) ] ) provided n ≥ w ( Ω ) + / ( − (cid:15) ) . (7.14)52ence, to apply Corollary 7.1 for ﬁnding n (the number of measurements needed to ensurerecovery), one must calculate the Gaussian width of Ω = T A ( x ∗ ) ∩ S p − . However, Gaussianwidths are not easy to compute [11, p. 819]. Chandrasekaran et. al stated various well-knownproperties and derived new properties of Gaussian widths that can be used to calculate boundson Gaussian widths in a variety of cases [11, p. 819ﬀ.]. The most important of these propertieswithin the scope of this dissertation are reproduced in the next section. This section states properties of Gaussian widths that might be useful for calculating theGaussian width of T A ( x ∗ ) ∩ S p − , where A are the atoms of the CVaR Norm. Proposition 7.4 ([11, p. 821]) . Let K be any non-empty convex cone in R p and let g ∼ N ( , I ) be a random Gaussian vector. Then w ( K ∩ S p − ) ≤ E g [ dist ( g , K ∗ )] , (7.15) where dist denotes the Euclidean distance between a point and a set. Since Corollary 7.1 requires w ( Ω ) , Jensen’s inequality is often useful to apply Proposition 7.4[11, p. 822]. Jensen’s inequality states that if E [ ξ ] exists for a random variable ξ and if f ( x ) isa convex function, then [10, p. 88] f ( E [ ξ ]) ≤ E [ f ( ξ )] . Because g is a random vector, dist ( g , K ∗ ) is a random variable. Also, f ( x ) = x is a convexfunction. Hence, [11, p. 822] E g [ dist ( g , K ∗ )] ≤ E g [ dist ( g , K ∗ ) ] . (7.16)By combining Equation 7.15 and Equation 7.16, Chandrasekaran et. al derived the lemmabelow. Lemma 7.1 ([11, p. 822]) . Let K be any non-empty convex cone in R p . Then w ( K ∩ S p − ) + w ( K ∗ ∩ S p − ) ≤ p . (7.17) As a proof on the bounds of the Gaussian width of T A ( x ∗ ) ∩ S p − could not be proven within the scope ofthis dissertation, the author can only make assumptions on which properties might be useful in a proof. For a more extensive list of properties see [11, p. 819ﬀ.]. hapter 8 Model Recovery Using the CVaRNorm

To use the CVaR norm for model recovery in the framework presented by Chandrasekaran et.al, some fundamental properties of the CVaR norm need to be derived. To recover ˆ x , the setof atoms A of the CVaR norm needs to be determined and a bound on the Gaussian widthof the intersection of T A ( ˆ x ) with the unit sphere S p − needs to be established. The bound onthe Gaussian width is needed to determine how many measurements n are required to ensurerecovery with a high probability.To the best knowledge of the author, no research with this particular focus has been pub-lished. Hence, all results in this chapter are original. Unfortunately, due to limited scope of thisthesis, only partial results are available. This being said, the following thoughts can be the basisfor further research in this area. In this section, the atoms of the CVaR norm C α for α p − < α < α p − will be conjectured (the setof atoms will be called A p − , see Subsection 8.1.1). It will be proposed and proven that A p − isa subset of the extreme points of the unit ball of C α for α p − < α < α p − , but due to the limitedtime of this thesis it cannot be proven that A p − is the exhaustive set of extreme points. It willalso be shown in Subsection 8.1.2 that a subset of the extreme points of the unit ball of C α for α < α < α (called A ) is similar to A p − . But since some of the points of A are diﬀerent, theunit ball of C α for α < α < α looks diﬀerent (the respective unit balls of C α in R are shownin Figure 8.1). Finally, an experiment will be performed to numerically determine the extremepoint of the unit ball of C α for α p − < α < α p − in R and shown that the set of these extremepoints is equal to A p − . The atoms of the CVaR norm for C α for α p − < α < α p − are conjectured below. Conjecture 8.1.

Suppose that x ∈ R p and α p − < α < α p − ,i.e., p − p < α < p − p , and let the setof atoms A p − be such that A p − ∶= {± e i } pi = ∪ { p ( − α ) b } , where e i is the unit vector with 1 as the i th component and 0 zeros elsewhere and { b } is theset of all vectors in R p that have either +1 or -1 as their components. Then the atomic norminduced by A p − is equivalent to the CVaR norm ⟪ x ⟫ α for p − p < α < p − p . roposition 8.1. The set A p − deﬁned in Conjecture 8.1 is a subset of extreme points of theunit ball of C α for α p − < α < α p − ,i.e., p − p < α < p − p .Proof. To prove Proposition 8.1, it needs to be shown that the points A p − lie on the unit ballof ⟪ x ⟫ α for p − p < α < p − p . To show this, an explicit expression for ⟪ x ⟫ α will be derived ﬁrst. ByEquation 5.11 and Equation 5.10, ⟪ x ⟫ α = λ ⟪ x ⟫ α p − + ( − λ )⟪ x ⟫ α p − = λ p ∑ i = p − ∣ x ( i ) ∣ + ( − λ )∣ x ( p ) ∣=∣ x ( p ) ∣ + [ p ( − α ) − ] ∣ x ( p − ) ∣ , (8.1)where ∣ x ( p ) ∣ is the largest of the absolute values of the components of x and ∣ x ( p − ) ∣ is the secondlargest.Now, there are two types of vectors in A , the unit vectors ± e i and the scaled b vectors. Forboth these types of vectors ⟪± e i ⟫ α = + [ p ( − α ) − ] × = ⟪ p ( − α ) b ⟫ α = p ( − α ) ( + [ p ( − α ) − ] × ) = . Hence all points in A p − lie on the unit ball of C α for p − p < α < p − p . α Let the set of points A = {± e i } pi = ∪ { p ( − α ) b } , with 0 < α < p . Then the points in A lie on theunit ball of C α for 0 < α < p and there is a close connection between A and A p − . To showthis, consider the explicit expression for ⟪ x ⟫ α , for 0 < α < p , which is ⟪ x ⟫ α = ∑ pi = ∣ x ( i ) ∣ − pα ∣ x ( ) ∣ .Then ⟪± e i ⟫ α = − pα × = ⟪ p ( − α ) b ⟫ α = pp ( − α ) − pαp ( − α ) = . Hence, both sets contain the unit vectors ± e i and the scaled binary vectors p ( − α ) b . However,the scaling factor is diﬀerent for the sets whenever p >

2, as for A p − , p − p < α < p − p , and for A , 0 < α < p . To show that the unit balls look diﬀerent for these two α , consider x = p ( − α ) [ , , . . . , ] T and x = p ( − α ) [ , , . . . , − , . . . , ] T , i.e., x ∈ R p consists of all ones and x ∈ R p consists of all ones except a − i th component, both scaled by p ( − α ) . Thenthe vectors y = x + x = p ( − α ) [ , , . . . , , . . . , ] T , x , and x , together with 0 < α < p and p − p < α < p − p have the norms ⟪ x ⟫ α = , for α = α , α , ⟪ x ⟫ α = , for α = α , α , ⟪ y ⟫ α = p − p ( − α ) < , and ⟪ y ⟫ α = . Hence the point y lies on an edge of the unit ball of C α for p − p < α < p − p , but lies inside the Just as for A p − , this is a conjecture that has yet to be proven. C α for 0 < α < p . This can also be seen from Figure 8.1.Figure 8.1: [17, p. 13] Unit balls of C α in R for < α < (left) and 0 < α < (right). A p − in R In this subsection, the atoms of C α for α p − < α < α p − in R are determined in numericalexperiments to provide more evidence that Conjecture 8.1 is true. To do this, 5,000 randomhyperplanes in R are projected onto the unit ball of the CVaR norm. If the conjecture istrue, all hyperplanes should be projected onto one of the points in A p − . Only if there areprojections onto other points, Conjecture 8.1 is can be deemed false [28].To perform this experiment, a random hyperplane is generated by a zero-mean, unit varianceGaussian vector, i.e., the hyperplane satisﬁes g T x =

5, where g ∈ R ∼ N ( , I ) and x ∈ R . Theprojection of the hyperplane onto the unit ball is given by x U = arg min x ⟪ x ⟫ α min x ⟪ x ⟫ α , with α = and the constraint g T x = C α Norm

To ﬁnd a bound on the measurements n needed to recover ˆ x using Problem 7.1 (for exactrecovery) or Problem 7.10 (for robust recovery) with the CVaR norm, an expression for thetangent cone or the normal cone of a vector x ∗ with respect to A p − needs to be found. Thederivation of expressions for these cones is beyond the scope of this thesis and could be anarea for further research. Here, only an outline of the bounds will be given, if expressions for T A p − ( x ∗ ) or N A p − ( x ∗ ) are available. These bounds are derived using the properties describedin Section 7.3. The probability that a random hyperplane is projected onto an edge or surface of the unit ball is equal tozero. The constant 5 is chosen arbitrarily. n needs to satisfy n ≥ w ( T A p − ( x ∗ ) ∩ S p − ) + n ≥ w ( T A p − ( x ∗ ) ∩ S p − ) + / ( − (cid:15) ) in the robust case.Since the Gaussian width is diﬃcult to calculate directly, the Euclidean distance between acone and the point given by a random Gaussian vector could be used to provide a bound for w ( T A p − ( x ∗ ) ∩ S p − ) . Using Equation 7.15 and Equation 7.16 gives w ( T A p − ( x ∗ ) ∩ S p − ) ≤ E g [ dist ( g , N A p − ( x ∗ ))] ≤ E g [ dist ( g , N A p − ( x ∗ )) ] (8.2)If an expression for N A p − ( x ∗ ) is available, Equation 8.2 could be used to determine theminimum number of measurements n needed to recover ˆ x as n ≥ E g [ dist ( g , N A p − ( x ∗ )) ] + n ≥ E g [ dist ( g , N A p − ( x ∗ )) ] + / ( − (cid:15) ) in the robust case,when the square of the Euclidean distance (dist ( g , N A p − ( x ∗ )) ) can be calculated or bounded.However, depending on the actual expressions of the tangent and normal cones, other prop-erties of Gaussian widths (e.g. those stated in [11, p. 819ﬀ.]) could be more useful to derivebounds on n . C α Norm

This section explores the recovery probabilities of a vector given n random measurements andusing CVaR norm minimization. Since Section 8.2 could not provide a bound on the requirednumber of measurements to ensure recovery, this section investigates under which circumstancesrecovery might be likely. However, the results are not promising.For the following investigation, the goal was to recover two vectors in R . The ﬁrst vector x consists of 1 atom (either a unit vector or a scaled binary vector). The second vector x consists of 3 atoms, one positive unit vector, one negative unit vector, and one scaled binaryvector. In both cases, the recovery probability was estimated by minimizing the CVaR norm ofa candidate x ∗ , with n ≤

100 random measurements (so that Φ ∈ R n × is a random map withi.i.d. zero mean Gaussian entries having variance 1 / n ) and α = .

985 (so that − < α < − ).For each n , Problem 7.1 was solved 50 times, each time with a new random map Φ . Theprobability of exact recovery (over the 50 random trials) was drawn versus the number of mea-surements n . This is shown in Figure 8.2. 57igure 8.2: Probability of exact recovery for a vector x ∈ R using the CVaR norm as theatomic norm with n measurements. Left: Recovery probability for x consisting of 1 atom(either a unit vector or a scaled binary vector). Right: Recovery probability for x consisting of3 atoms.Figure 8.2 shows that if x consists of a unit vector, at least 90 measurements are necessaryto ensure recovery, while if x consists of a scaled binary vector, recovery could be ensuredwith 50-60 measurements. The second vector x could never be recovered for n <

95 and evenfor n =

99, the recovery probability was just below 80 %. Hence, it seems that if a vector x ∗ which is to be recovered consists of both types of atoms (i.e. unit vectors and scaled binary vec-tors), exact recovery cannot be guaranteed with high probability when n < p . This means thatto recover x ∗ , one would need as many observations as the dimension of the system. The rea-son for these unfavourable characteristics might be the tangent cone of x ∗ with respect to A p − . If x ∗ consists only of one type of atom, i.e., either of unit vectors or scaled binary vectors,the model recovery using the CVaR norm could be compared against the model recovery usingthe L norm or L ∞ norm, respectively. Depending on the type of atoms, the C α norm showstwo diﬀerent characteristics when compared to the respective L p norm. When x ∗ is a k-sparsevector the norm of choice for model recovery is the L norm. By Proposition 3.10 of [11,p. 823], to recover a k-sparse vector x ∗ ∈ R using the L norm, 2 × k × ln ( k ) + × k + x with high probability. Hence, for a 1-sparsevector approximately 12 measurements suﬃce, while for a 3-sparse vector approximately 26measurements suﬃce. At the same time, more than 90 measurements are necessary to recoverthe same 1-sparse or 3-sparse vector x ∗ and same Φ to ensure comparability (see Figure 8.3). This assumption can only be conﬁrmed if an expression for T A p − ( x ∗ ) can be derived. A k-sparse vector is a vector where k components are not equal to zero. x ∈ R using the L norm or C α norm as the atomic norm with n measurements. Left: Recovery probability for a 1-sparsevector. Right: Recovery probability for 3-sparse vector.When x ∗ is the sum of k scaled binary vectors the norm of choice for model recovery is the L ∞ norm. When trying to recover a vector x ∗ , that is either 1 scaled binary vector or the sumof 3 scaled binary vectors, the C α norm is as good as the L ∞ norm, and sometimes the C α norm is even slightly better. Drawing the probability of exact recovery with the same x ∗ tobe recovered and the same random measurement maps Φ for 40 ≤ n ≤

80 shows that in certaincases the recovery probability of x ∗ was higher when using the C α norm (see Figure 8.4).Figure 8.4: Probability of exact recovery for a vector x ∈ R that is the sum of k scaledbinary vectors using the L ∞ norm or C α norm as the atomic norm with n measurements. Left:Recovery probability for x as 1 scaled binary vector. Right: Recovery probability for x as thesum of 3 scaled binary vectors. 59 .4 Concluding Remarks on Model Recovery Using the CVaRNorm Despite the incomplete proofs, this chapter could show some interesting properties of the CVaRnorm regarding model recovery. It seems that the CVaR norm is not suitable to deﬁne an owntype of signal to be recovered (i.e. a signal which consists of the atoms A p − ), but the CVaRnorm could be an improvement over the L ∞ norm for model recovery.Since the unit balls of C α diﬀered for diﬀerent choices of α , it was suggested to take C α with p − p < α < p − p as the atomic norm for recovering a vector x ∗ ∈ R p . Then the set of atoms A p − (see Conjecture 8.1) can be interpreted as the union of two sets of atoms of better knownnorms, namely the atoms of the L norm and the atoms of the L ∞ norm, scaled by p ( − α ) . The parameter α was chosen in the range ( p − p , p − p ) for these investigations, however, whenchoosing 0 < α < p , the results might be diﬀerent. This could be an area for further research.Unfortunately, a bound on the number of random measurements n could not be established,as it was not possible to derive expressions for the tangent or normal cones with respect to A p − in the scope of this thesis. As a remedy, numerical experiments were performed to gain insightinto exact recovery probabilities using the CVaR norm.The numerical experiments in Section 8.3 suggest that it is not possible to recover an arbi-trary x ∗ with a high probability when n < p , i.e. when the number of observations is smallerthan the dimension of the model. Hence, it would not make sense to use the CVaR norm forthe recovery of a signal consisting of the atoms of A p − . It was also shown that the CVaRnorm is not suitable to recover a k-sparse vector. However, the CVaR norm showed a slightimprovement over the L ∞ norm in the experiments, when trying to recover signals x ∗ that areformed as the sum of k scaled binary vectors. The reason for this is probably that the tangentcone with respect to A p − at x ∗ is smaller than the tangent cone with respect to the atoms ofthe L ∞ norm. This would need to be conﬁrmed in further research, as it was not possible toderive an expression for T A p − ( x ∗ ) in the scope of this thesis. Also, the practical implications ofthis need to be considered, as the gains of a smaller tangent cone might be oﬀset by the greatereﬀort to calculate the CVaR norm compared to the L ∞ norm.Again, it should be stressed that the numerical experiments were done by choosing α as p − p < α < p − p . Choosing a diﬀerent α gives a diﬀerent unit ball and therefore diﬀerent characteristicsfor the model recovery problem. This could all be evaluated in further research. The proof Conjecture 8.1 still needs to be completed. A real world occurrence of this type of signal (or model) could not be identiﬁed during this thesis. hapter 9 Conclusion

This thesis covered a wide range of theory on CVaR, both as a risk measure and a vector norm.It was shown how the CVaR is deﬁned for a univariate loss distribution and how this deﬁnitioncan be extended to deﬁne the CVaR of a portfolio of assets, i.e. for multivariate loss distribu-tions. The CVaR concept was then abstracted to deﬁne a new family of vector norms in R n ,which were then analysed in detail. In the last part of the thesis, model recovery problems wereintroduced and it was shown how the new CVaR norm could be used in the context of modelrecovery problems.Chapter 2 started by introducing Value-at-Risk, and showed how the Conditional Value-at-Risk can be derived from VaR in the case of a continuous random variable. Then, the notionof a coherent risk measure was introduced and it was explained why VaR fails to be coherent,whereas CVaR is. After this intuitive introduction, CVaR was properly deﬁned and analysed inSection 2.3. CVaR can be calculated as the expectation of the generalized α tail distribution.Alternatively, CVaR can be calculated as a weighted average of VaR and CVaR + by the ConvexCombination Formula (see Equation 2.15). Another possibility to calculate CVaR is to useAcerbi’s Integral Formula (presented in Section 2.4), for which a novel proof for continuous lossdistributions was given in Subsection 2.4.1.Chapter 3 then extended the ideas developed in Chapter 2 to multivariate loss distributionswhich arise in portfolio selection. To introduce portfolio optimization problems, Section 3.1presented the ﬁrst model that was developed to minimize portfolio risk, i.e. the MarkowitzModel (see Problem 3.3). It was also shown that it is always favourable to diversify a portfolioin order to reduce risk. The optimal risk/return combinations that can be achieved in a portfoliowere drawn to explain the eﬃcient frontier. Motivated by some shortcomings of the MarkowitzModel, the Rockafellar and Uryasev Model was presented in Section 3.2 to demonstrate howa portfolio can be optimized with regards to minimizing the portfolio’s tail risk. The modeland associated linear optimization programme that has been developed in [29] was analysed indetail, before establishing a connection between the Markowitz Model and the Rockafellar andUryasev Model. Section 3.3 concluded the chapter by providing two numerical examples. Theﬁrst example showed that in certain cases, Mean-Variance and CVaR optimization indeed givethe same optimal portfolio, while the second example showed that for skewed loss distributionsCVaR optimization is preferable over Mean-Variance optimization.For situations in which a portfolio has already been formed, but for which the investor wishesto hedge risks, a procedure was presented in Chapter 4. Since the example was a trader’s portfo-lio consisting of stock options, the ﬁnancial background on options was presented in Section 4.1,while Section 4.2 showed how a risk managers can estimate the daily asset volatilities to properlymanage the risk on a daily basis. The trader’s portfolio was described in Section 4.3 and thehedging procedure was outlined in detail in Section 4.4. The original contribution of Section 4.4was the explicit formulation of the linear programme to minimize the CVaR of the portfolio.61ext, the focus shifted away from ﬁnancial applications of CVaR. The fairly new conceptof CVaR norms was introduced in Chapter 5. The ﬁrst one, the Scaled CVaR norm, waspresented in Section 5.1, with its deﬁnition and alternative characterization given by Pavlikovand Uryasev in [25]. A novel contribution was an alternative proof for the equivalence of thetwo characterizations. Next, the Non-Scaled CVaR norm (or simply CVaR norm) was presentedin Section 5.2, by showing how it can be derived from the Scaled CVaR norm. Also, it wasshown how the CVaR norm can be interpreted as the optimal value of the knapsack problem.To provide a better understanding of these new norms, Section 5.3 stated some of the quitediﬀerent properties that the two CVaR norms have. A new property of the Scaled CVaR norm,i.e. piecewise convexity, was proposed and proven, which was again an original contributionof this thesis. Finally, the computational eﬃciencies of the diﬀerent characterizations of theCVaR norms were investigated in Section 5.4. This comparison of computing times was anotheroriginal contribution.After introducing the Scaled CVaR norm and CVaR norm, comparisons to the more familiarfamily of L p norms were drawn in Chapter 6. The main goal of this chapter was to show how C Sα and C α behave in comparison to L Sp and L p for diﬀerent combinations of α and p . Also,in Section 6.2 it was analysed how to choose α in relation to p so that the C α most closelyapproximates the L p norm.A possible application of the CVaR norm was investigated for model recovery problems. Thetheoretical background for model recovery problems was presented in Chapter 7. The aim ofthese problems is to recover models or signals of dimension p with n < p random measurements.Atomic norms and important concepts from convex geometry, such as tangent and normal cones,were introduced in Section 7.1. The recovery conditions (which are based on atomic norms andconvex geometry) were presented in Section 7.2. For these conditions, the Gaussian width of aset plays a crucial role, but it is generally diﬃcult to determine the Gaussian width of arbitrarysets. Therefore, Section 7.3 presented selected properties of Gaussian widths, which might beuseful in calculating bounds on Gaussian widths relating to the CVaR norm.The ﬁnal chapter, Chapter 8, contained completely original work. The goal of this chapterwas to show how the CVaR norm could be used for model recovery problems. Due to thelimited scope of this thesis, only partial results could be presented so that this chapter mightform a basis for further research in this area. Section 8.1 gave a conjecture on the set of atomsrelating to the CVaR norm for p − p < α < p − p (Conjecture 8.1), which was partially proven. Acomparison of unit balls of the C α norm for p − p < α < p − p and 0 < α < p was given, and anumerical experiment was performed in R to provide evidence for Conjecture 8.1. The ﬁnalsection, Section 8.3, then performs numerical experiments to show the recovery rate for diﬀerent x ∗ using the CVaR norm as the atomic norm. From these experiments, it appears that theCVaR norm is not suitable to recover an own type of signal, as recovery could not be guaranteedwith high probability for n < p . For other types of x ∗ (i.e. k -sparse vectors and vectors that arethe sum of k binary vectors), model recovery using the CVaR norm was compared to using the L norm and L ∞ norm, respectively. While the CVaR norm performed considerably worse thanthe L norm for recovering k -sparse vectors, the CVaR norm was marginally better than the L ∞ norm for recovering vectors that are the sum of k binary vectors. As these experiments werecarried out with a particular choice of α , diﬀerent α might yield diﬀerent results, as the unitballs of the CVaR are quite diﬀerent depending on α . Hence, it might be promising to conductfurther research in this area. 62 ibliography [1] V. Acary, O. Bonnefon, and B. Brogliato. Nonsmooth Modeling and Simulation for SwitchedCircuits . Lecture Notes in Electrical Engineering. Springer Netherlands, 2011.[2] C. Acerbi and D. Tasche. On the coherence of expected shortfall.

Journal of Banking &Finance , 26(7):1487–1503, 2002.[3] P. Albrecht, M. Huggenberger, and A. Pekelis. Tail risk hedging and regime switching. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1945303 , June 2015. accessed: 29July 2015.[4] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath. Coherent measures of risk.

MathematicalFinance , 9(3):203–228, 1999.[5] O. Bardou, N. Frikha, and G. Pags. CVaR hedging using quantization-based stochastic ap-proximation algorithm. https://hal.archives-ouvertes.fr/hal-00547776 , December2010. accessed: 15 July 2015.[6] A. Ben-Tal and A. Nemirovski. Robust convex optimization.

Mathematics of OperationsResearch , 23(4):769–805, 1998.[7] D. Bertsimas, D. Pachamanova, and M. Sim. Robust linear optimization under generalnorms.

Operations Research Letters , 32(6):510–516, 2004.[8] Z. Bodie, A. Kane, and A. J. Marcus.

Investments . McGraw-Hill Education, Tenth edition,2014.[9] F. F. Bonsall. A general atomic decomposition theorem and Banach’s closed range theorem.

The Quarterly Journal of Mathematics , 42(1):9–14, 1991.[10] A. A. Borovkov.

Probability Theory . Universitext. Springer London, 2013.[11] V. Chandrasekaran, B. Recht, P. Parrilo, and A. Willsky. The convex geometry of linearinverse problems.

Foundations of Computational Mathematics , 12(6):805–849, 2012.[12] R. Chatterjee.

Practical Methods of Financial Engineering and Risk Management Tools forModern Financial Professionals . Quantitative ﬁnance series. Apress, 2014.[13] M. Choudhry.

An Introduction to Value-at-Risk . John Wiley & Sons, Third edition, 2006.[14] G. Cornuejols and R. T¨ut¨unc¨u.

Optimization methods in ﬁnance . Cambridge UniversityPress, 2007.[15] E. Fragni`ere. Financial risk management, lecture notes, week 1, January 2015.[16] Y. Gordon. On Milman’s inequality and random subspaces which escape through a mesh in R n . In J. Lindenstrauss and V. Milman, editors, Geometric Aspects of Functional Analysis ,volume 1317 of

Lecture Notes in Mathematics , pages 84–106. Springer Berlin Heidelberg,1988. 6317] J.-Y. Gotoh and S. Uryasev. Two pairs of families of polyhedral norms versus l p -norms:Proximity and applications in optimization. Technical Report, University of Florida, 2015.[18] M. Grant and S. Boyd. Graph implementations for nonsmooth convex programs. In V. Blon-del, S. Boyd, and H. Kimura, editors, Recent Advances in Learning and Control , LectureNotes in Control and Information Sciences, pages 95–110. Springer-Verlag Limited, 2008. http://stanford.edu/~boyd/graph_dcp.html .[19] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version2.1. http://cvxr.com/cvx , March 2014.[20] J.-B. Hiriart-Urruty and C. Lemar´echal.

Fundamentals of Convex Analysis . GrundlehrenText Editions. Springer Berlin Heidelberg, 2001.[21] J. C. Hull.

Options, Futures, And Other Derivatives . Pearson Education Limited, Eighthedition, 2012.[22] H.-M. Kaltenbach.

A Concise Guide to Statistics . SpringerBriefs in Statistics. SpringerBerlin Heidelberg, 2012.[23] H. Markowitz. Portfolio selection.

Journal of Finance , 7(1):77–91, 1952.[24] H. Mausser and D. Rosen. Beyond VaR: from measuring risk to managing risk. In

Computa-tional Intelligence for Financial Engineering, 1999. (CIFEr) Proceedings of the IEEE/IAFE1999 Conference on , pages 163–178, 1999.[25] K. Pavlikov and S. Uryasev. CVaR norm and applications in optimization.

OptimizationLetters , 8(7):1999–2020, 2014.[26] E. Prugoveˆcki. Chapter I: Basic Ideas of Hilbert Space Theory. volume 92 of

Pure andApplied Mathematics , pages 11–56. Elsevier, 1981.[27] P. Richt´arik. Optimization methods in ﬁnance, lecture notes, 2015.[28] P. Richt´arik. Personal discussion on 18 August, 2015.[29] R. T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk.

Journal ofRisk , 2(3):21–41, 2000.[30] R. T. Rockafellar and S. Uryasev. Conditional value-at-risk for general loss distributions.

Journal of Banking & Finance , 26(7):1443–1471, 2002.[31] N. Topaloglou, H. Vladimirou, and S. A. Zenios. CVaR models with selective hedging forinternational asset allocation.

Journal of Banking & Finance , 26(7):1535–1561, 2002.[32] W. L. Winston and J. B. Goldberg.

Operations Research: Applications and Algorithms .Thomson Brooks/Cole, Fourth edition, 2004.[33] G. Wolf. Financial risk management, lecture notes, week 2, January 2015.[34] W. Xue, L. Ma, and H. Shen. Optimal inventory and hedging decisions with CVaR con-sideration.

International Journal of Production Economics , 162(0):70–82, 2015.[35] K. Yang, J. Huang, Y. Wu, X. Wang, and M. Chiang. Distributed robust optimization(dro), part I: framework and example.

Optimization and Engineering , 15(1):35–67, 2014.64 ppendix A

Matlab Code

A.1 List of Matlab Code Developed During this Dissertation x ∈ R n at a given α using Deﬁnition 5.2 (see Appendix A.4) CVaR normcalculations2 CVaR Norm Optimization.m Calculate the CVaR norm of x ∈ R n at a given α using Proposition 5.2 (see Appendix A.5) CVaR normcalculations3 Scaled CVaR NormComponent.m Calculate the Scaled CVaR norm of x ∈ R n at agiven α using Deﬁnition 5.1 (see Appendix A.2) Scaled CVaR normcalculations4 Scaled CVaR NormOptimization.m Calculate the Scaled CVaR norm of x ∈ R n at agiven α using Proposition 5.1 (see Appendix A.3) Scaled CVaR normcalculations5 Experiment01 CVaR NormsComputing Times.m Compare computing times of codes 1-4 Table 5.1,Table 5.2,Appendix B.66 Experi-ment03 CVaR Norm on 2D grid.m Draw surface plots of C α and L p of x ∈ R fordiﬀerent α and p Figure 6.6,Appendix C.37 Experiment05 CVaR Lp Norm asfunctions of alpha p.m Calculate C Sα , C α and corresponding L p , L Sp for α ∈ [ , ] Figure 6.1,Figure 6.48 Experiment06 Projecting Pointsonto unit ball.m Project a circle in R onto the unit ball x + x = , x = L norm and C α normminimization for diﬀerent α Figure 6.7,Appendix C.49 Experiment07 UL ratio for Lpapproximation by CVaR norm.m Calculate and draw proximity ratio of C α and L p for diﬀerent p Figure 6.310 Experiment10 MVO CVaROptimization Normal Dist.m Compute Mean-Variance and CVaR optimalportfolios for normally distributed losses Table 3.311 Experiment11 MVO CVaROptimization Skewed Dist.m Compute Mean-Variance and CVaR optimalportfolios for skewed loss distributions, drawhistogram of simulated portfolio losses, give riskmetrics of optimal portfolios Table 3.6,Table 3.5,Appendix C.212 Experiment12 Hedging.m Perform Hedging procedure described inSection 4.4, draw option payoﬀ proﬁles before /after hedging, draw loss distribution before / afterhedging, give risk metrics of portfolio before / afterhedge Figure 4.4,Figure 4.6,Figure 4.5,Figure 4.7,Table 4.2,Appendix B.4,Appendix B.513 Experiment13 VaRCVaR pdf cdf.m Draw pdf and cdf of a normal random variable toexplain VaR and CVaR Figure 2.114 Experiment14 MVOEﬃcient Frontier.m Calculate Mean-Variance optimal portfolio fordiﬀerent required expected returns R and draweﬃcient frontier Figure 3.115 Experiment15 FindCVaR Graphically.m Draw φ α ( c ) (Equation 3.8)for diﬀerent c Figure 3.216 Experiment16a ScaledCVaR own examples.m Draw unit balls of C Sα for diﬀerent values of α Figure 5.117 Experiment16b CVaRown examples.m Draw unit balls of C α for diﬀerent values of α Figure 5.218 Experiment17 Show PiecewiseConvexity CSalpha.m Draw C Sα of 4 diﬀerent x versus α Figure 5.3

Continued on next page... I.. continued from previous page

19 Experiment20 CVaRModel Recovery.m Test model recovery of diﬀerent x ∗ using CVaRnorm Figure 8.220 Experiment20a L1Model Recovery.m Compare recovery probability of diﬀerent x ∗ usingCVaR norm versus L norm Figure 8.321 Experiment20b LinftyModel Recovery.m Compare recovery probability of diﬀerent x ∗ usingCVaR norm versus L ∞ norm Figure 8.422 Experiment21 CVaR Atoms R4.m Project random hyperplanes onto unit ball of C . in R Appendix B.7

A.2 Scaled CVaR Calculation based on Deﬁnition 5.1 % Author : % Jakob K i s i a l a , June 2015 % Computes the s c a l e d CVaR norm o f a v e c t o r at a given alpha , u s i n g % componentwise d e f i n i t i o n % INPUT: % x = n − by − % alpha = s c a l a r between 0 and 1 % OUTPUT: % C S alpha = << x >> ˆS { alpha } f u n c t i o n C S alpha = Scaled CVaR Norm Component ( x , alpha ) C S alpha = 0 ; % check i f alpha i s a d m i s s i b l e i f ( alpha < | | alpha > d i s p l a y ( ’ P l e a s e put i n an alpha such that 0 < = alpha < = 1 − Scaled CVaR could not bec a l c u l a t e d ’ ) ; r e t u r n end % check i f x i s a v e c t o r s i z e x = s i z e ( x ) ; dim x = l e n g t h ( s i z e x ) ; i f ( dim x >

2) % x has more than 2 dimensions d i s p l a y ( ’ P l e a s e only input v e c t o r s x − Scaled CVaR could not be c a l c u l a t e d ’ ) ; r e t u r n end i f ( s i z e x ( 1 ) > >

1) % x i s a matrix d i s p l a y ( ’ P l e a s e only input v e c t o r s x − Scaled CVaR could not be c a l c u l a t e d ’ ) ; r e t u r n end n = l e n g t h ( x ) ; % check f o u r c a s e s : % 0 : alpha = 0 % 1 : alpha > ( n −

1) /n % 2 : alpha equal to some a l p h a j % 3 : alpha between a l p h a j and a l p h a { j +1 } % c a s e 0 : alpha = 0 i f ( alpha == 0) C S alpha = sum( abs ( x ) ) /n ; r e t u r n end % f o r the remaining t h r e e c a s e s a d d i t i o n a l v e c t o r s are needed : a l p h a j v e c t o r = ( [ 0 : n − % c a s e 1 : alpha > ( n −

1) /n i f ( alpha > a l p h a j v e c t o r ( n ) ) C S alpha = max( abs ( x ) ) ; r e t u r n end % s o r t v e c t o r x by magnitude o f components x a b s s o r t e d = s o r t ( abs ( x ) ) ; II e p s i l o n = 1e − temp vector = a l p h a j v e c t o r − alpha ; % c a s e 2 : alpha equal to some a l p h a j i f ( any ( abs ( temp vector ) < e p s i l o n ) ) C S alpha = c a l c u l a t e N o r m f o r a l p h a j ( x a b s s o r t e d , alpha ) ; r e t u r n end % c a s e 3 : alpha between a l p h a j and a l p h a { j +1 } % f i n d a l p h a j temp index = temp vector < a l p h a j = max( a l p h a j v e c t o r ( temp index ) ) ; % f i n d a l p h a { j +1 } temp index = temp vector > a l p h a j P l u s 1 = min ( a l p h a j v e c t o r ( temp index ) ) ; mu = ( ( a l p h a j P l u s 1 − alpha ) ∗ (1 − a l p h a j ) ) / ( ( a l p h a j P l u s 1 − a l p h a j ) ∗ (1 − alpha ) ) ; C aj = c a l c u l a t e N o r m f o r a l p h a j ( x a b s s o r t e d , a l p h a j ) ; C ajPlus1 = c a l c u l a t e N o r m f o r a l p h a j ( x a b s s o r t e d , a l p h a j P l u s 1 ) ;

C S alpha = mu ∗ C aj + (1 − mu) ∗ C ajPlus1 ; % f u n c t i o n to c a l c u l a t e the Cˆ S { alpha } f o r a l p h a j f u n c t i o n C S alpha1 = c a l c u l a t e N o r m f o r a l p h a j ( vector , a l p h a j ) j = f i n d ( abs ( a l p h a j v e c t o r − a l p h a j ) < − − C S alpha1 = (1 / ( n − j ) ) ∗ sum( v e c t o r ( j +1:n ) ) ; end end A.3 Scaled CVaR Calculation based on Proposition 5.1 % Author : % Jakob K i s i a l a , June 2015 % Computes the s c a l e d CVaR norm o f a v e c t o r at a given alpha , u s i n g % CVaR o p t i m i z a t i o n % INPUT: % x = n − by − % alpha = s c a l a r between 0 and 1 % OUTPUT: % C S alpha = << x >> ˆS { alpha } f u n c t i o n C S alpha = Scaled CVaR Norm Optimization ( x , alpha ) C S alpha = 0 ; % check i f alpha i s a d m i s s i b l e i f ( alpha < | | alpha > d i s p l a y ( ’ P l e a s e put i n an alpha such that 0 < = alpha < = 1 − Scaled CVaR could not bec a l c u l a t e d ’ ) ; r e t u r n end % check i f x i s a v e c t o r s i z e x = s i z e ( x ) ; dim x = l e n g t h ( s i z e x ) ; i f ( dim x >

2) % x has more than 2 dimensions d i s p l a y ( ’ P l e a s e only input v e c t o r s x − Scaled CVaR could not be c a l c u l a t e d ’ ) ; r e t u r n end i f ( s i z e x ( 1 ) > >

1) % x i s a matrix d i s p l a y ( ’ P l e a s e only input v e c t o r s x − Scaled CVaR could not be c a l c u l a t e d ’ ) ; r e t u r n end x abs = abs ( x ) ; % s p e c i a l c a s e : alpha = 1 i f ( alpha == 1) C S alpha = max( x abs ) ; r e t u r n end III % use CVaR o p t i m i z a t i o n to c a l c u l a t e norm n = l e n g t h ( x ) ; e = ones ( n , 1 ) ; c v x b e g i n c v x q u i e t ( t r u e ) % s u p r e s s e s cvx ’ s output v a r i a b l e s z ( n ) c minimize ( c + ( 1 / ( n ∗ (1 − alpha ) ) ) ∗ ( e ’ ∗ z ) ) s u b j e c t to z > = x abs − c ; z > = 0 ; cvx end C S alpha = c v x o p t v a l ; end

A.4 CVaR Calculation based on Deﬁnition 5.2 % Author : % Jakob K i s i a l a , June 2015 % Computes the ( non − s c a l e d ) CVaR norm o f a v e c t o r at a given alpha , u s i n g % componentwise d e f i n i t i o n % INPUT: % x = n − by − % alpha = s c a l a r between 0 and 1 % OUTPUT: % C alpha = << x >> { alpha } f u n c t i o n C alpha = CVaR Norm Component ( x , alpha ) C alpha = 0 ; % check i f alpha i s a d m i s s i b l e i f ( alpha < | | alpha > = 1) d i s p l a y ( ’ P l e a s e put i n an alpha such that 0 < = alpha < − CVaR could not bec a l c u l a t e d ’ ) ; r e t u r n end % check i f x i s a v e c t o r s i z e x = s i z e ( x ) ; dim x = l e n g t h ( s i z e x ) ; i f ( dim x > d i s p l a y ( ’ P l e a s e only input v e c t o r s x − CVaR could not be c a l c u l a t e d ’ ) ; r e t u r n end i f ( s i z e x ( 1 ) > > d i s p l a y ( ’ P l e a s e only input v e c t o r s x − CVaR could not be c a l c u l a t e d ’ ) ; r e t u r n end % check f o u r c a s e s : % 0 : alpha = 0 % 1 : alpha > ( n −

1) /n % 2 : alpha equal to some a l p h a j % 3 : alpha between a l p h a j and a l p h a { j +1 } % c a s e 0 : alpha = 0 i f ( alpha == 0) C alpha = sum( abs ( x ) ) ; r e t u r n end % f o r the remaining t h r e e c a s e s a d d i t i o n a l v e c t o r s are needed : n = l e n g t h ( x ) ; a l p h a t i m e s n = alpha ∗ n ; % c a s e 1 : alpha > ( n −

1) /n i f ( a l p h a t i m e s n > n − C alpha = n ∗ (1 − alpha ) ∗ max( abs ( x ) ) ; r e t u r n end % x vector , i n a b o s l u t e v a l u e s s o r t e d i n ascending o r d e r IV x a b s s o r t e d = s o r t ( abs ( x ) ) ; e p s i l o n = 1e − % c a s e 2 : alpha equal to some a l p h a j i f (mod( a l p h a t i m e s n , 1 ) < e p s i l o n ) %j = f i n d ( abs ( a l p h a j v e c t o r − alpha ) < − − %C S alpha = (1 / ( n − j ) ) ∗ sum( x a b s s o r t e d ( j +1:n ) ) ; C alpha = c a l c u l a t e N o r m f o r a l p h a j ( x a b s s o r t e d , round ( a l p h a t i m e s n ) ) ; r e t u r n end % c a s e 3 : alpha between a l p h a j and a l p h a { j +1 } % f i n d a l p h a j j = f l o o r ( a l p h a t i m e s n ) ; a l p h a j = j /n ; % f i n d a l p h a { j +1 } j P l u s 1 = c e i l ( a l p h a t i m e s n ) ; a l p h a j P l u s 1 = j P l u s 1 /n ; lambda = ( a l p h a j P l u s 1 − alpha ) / ( a l p h a j P l u s 1 − a l p h a j ) ; C aj = c a l c u l a t e N o r m f o r a l p h a j ( x a b s s o r t e d , j ) ; C ajPlus1 = c a l c u l a t e N o r m f o r a l p h a j ( x a b s s o r t e d , j P l u s 1 ) ;

C alpha = lambda ∗ C aj + (1 − lambda ) ∗ C ajPlus1 ; % f u n c t i o n to c a l c u l a t e the Cˆ S { alpha } f o r a l p h a j f u n c t i o n C alpha1 = c a l c u l a t e N o r m f o r a l p h a j ( vector , j ) C alpha1 = sum( v e c t o r ( j +1:n ) ) ; end end A.5 CVaR Calculation based on Proposition 5.2 % Author : % Jakob K i s i a l a , June 2015 % Computes the ( non − s c a l e d ) CVaR norm o f a v e c t o r at a given alpha , u s i n g % CVaR o p t i m i z a t i o n % INPUT: % x = n − by − % alpha = s c a l a r between 0 and 1 % OUTPUT: % C alpha = << x >> { alpha } f u n c t i o n C alpha = CVaR Norm Optimization ( x , alpha ) C alpha = 0 ; % check i f alpha i s a d m i s s i b l e i f ( alpha < | | alpha > = 1) d i s p l a y ( ’ P l e a s e put i n an alpha such that 0 < = alpha < − CVaR could not bec a l c u l a t e d ’ ) ; r e t u r n end % check i f x i s a v e c t o r s i z e x = s i z e ( x ) ; dim x = l e n g t h ( s i z e x ) ; i f ( dim x > d i s p l a y ( ’ P l e a s e only input v e c t o r s x − CVaR could not be c a l c u l a t e d ’ ) ; r e t u r n end i f ( s i z e x ( 1 ) > > d i s p l a y ( ’ P l e a s e only input v e c t o r s x − CVaR could not be c a l c u l a t e d ’ ) ; r e t u r n end x abs = abs ( x ) ; % use CVaR o p t i m i z a t i o n to c a l c u l a t e norm n = l e n g t h ( x ) ; e = ones ( n , 1 ) ; V c v x b e g i n c v x q u i e t ( t r u e ) % s u p r e s s e s cvx ’ s output v a r i a b l e s z ( n ) c minimize ( n ∗ (1 − alpha ) ∗ c + e ’ ∗ z ) s u b j e c t to z > = x abs − c ; z > = 0 ; cvx end C alpha = c v x o p t v a l ; end VI ppendix B Extended Tables

B.1 Option Prices on NASDAQ:YHOO on 22 July 2015, 9:00a.m. New York Time

Underlying Option Strike Price Underlying Option Strike Price

Yahoo Call 31.5 7.050 Yahoo Put 31.5 0.170Yahoo Call 34.0 4.625 Yahoo Put 34.0 0.020Yahoo Call 35.0 3.650 Yahoo Put 35.0 0.025Yahoo Call 35.5 3.125 Yahoo Put 35.5 0.030Yahoo Call 36.0 2.520 Yahoo Put 36.0 0.040Yahoo Call 36.5 2.305 Yahoo Put 36.5 0.045Yahoo Call 37.0 1.790 Yahoo Put 37.0 0.060Yahoo Call 37.5 1.330 Yahoo Put 37.5 0.080Yahoo Call 38.0 0.905 Yahoo Put 38.0 0.130Yahoo Call 38.5 0.575 Yahoo Put 38.5 0.285Yahoo Call 39.0 0.305 Yahoo Put 39.0 0.480Yahoo Call 39.5 0.155 Yahoo Put 39.5 0.880Yahoo Call 40.0 0.085 Yahoo Put 40.0 1.260Yahoo Call 40.5 0.060 Yahoo Put 40.5 1.740Yahoo Call 41.0 0.040 Yahoo Put 41.0 2.195Yahoo Call 41.5 0.025 Yahoo Put 41.5 2.715Yahoo Call 42.0 0.030 Yahoo Put 42.0 3.225Yahoo Call 42.5 0.035 Yahoo Put 42.5 3.725Yahoo Call 43.0 0.015 Yahoo Put 43.0 4.225Yahoo Call 43.5 0.065 Yahoo Put 43.5 4.650Yahoo Call 44.0 0.025 Yahoo Put 44.0 5.275Yahoo Call 44.5 0.170 Yahoo Put 44.5 5.675Yahoo Call 45.0 0.015 Yahoo Put 45.0 6.150Yahoo Call 46.5 0.010 Yahoo Put 46.5 7.700Yahoo Call 49.5 0.010 Yahoo Put 49.5 10.500Yahoo Call 50.0 0.010 Yahoo Put 50.0 11.025Yahoo Call 50.5 0.010 Yahoo Put 50.5 11.525

VII .2 Option Prices on NASDAQ:GOOGL on 22 July 2015, 9:00a.m. New York Time

Underlying Option Strike Price Underlying Option Strike Price

Google Call 510.0 194.200 Google Put 510.0 0.030Google Call 535.0 169.400 Google Put 535.0 0.155Google Call 545.0 158.950 Google Put 545.0 0.180Google Call 550.0 153.950 Google Put 550.0 0.030Google Call 560.0 144.200 Google Put 560.0 0.055Google Call 565.0 139.200 Google Put 565.0 0.130Google Call 570.0 134.200 Google Put 570.0 0.130Google Call 580.0 124.200 Google Put 580.0 0.205Google Call 590.0 114.200 Google Put 590.0 0.180Google Call 597.5 106.500 Google Put 597.5 0.155Google Call 600.0 104.250 Google Put 600.0 0.030Google Call 615.0 88.950 Google Put 615.0 0.155Google Call 620.0 83.950 Google Put 620.0 0.155Google Call 630.0 74.000 Google Put 630.0 0.155Google Call 650.0 54.350 Google Put 650.0 0.150Google Call 652.5 52.100 Google Put 652.5 0.275Google Call 655.0 49.500 Google Put 655.0 0.275Google Call 657.5 46.850 Google Put 657.5 0.275Google Call 660.0 44.550 Google Put 660.0 0.300Google Call 665.0 39.550 Google Put 665.0 0.425Google Call 667.5 36.900 Google Put 667.5 0.525Google Call 670.0 34.650 Google Put 670.0 0.600Google Call 675.0 29.950 Google Put 675.0 0.800Google Call 677.5 27.600 Google Put 677.5 0.950Google Call 680.0 25.400 Google Put 680.0 1.150Google Call 682.5 23.150 Google Put 682.5 1.375Google Call 685.0 20.900 Google Put 685.0 1.700Google Call 687.5 18.650 Google Put 687.5 2.075Google Call 690.0 16.800 Google Put 690.0 2.600Google Call 692.5 14.750 Google Put 692.5 3.175Google Call 695.0 12.850 Google Put 695.0 3.850Google Call 697.5 11.350 Google Put 697.5 4.700Google Call 700.0 9.900 Google Put 700.0 5.600Google Call 702.5 8.450 Google Put 702.5 6.750Google Call 705.0 7.250 Google Put 705.0 8.100Google Call 710.0 5.050 Google Put 710.0 10.950Google Call 712.5 4.250 Google Put 712.5 12.550Google Call 715.0 3.450 Google Put 715.0 14.250Google Call 717.5 2.875 Google Put 717.5 16.100Google Call 720.0 2.425 Google Put 720.0 18.200Google Call 725.0 1.675 Google Put 725.0 22.550Google Call 730.0 1.175 Google Put 730.0 27.200Google Call 735.0 0.775 Google Put 735.0 31.350

VIII .3 Trader’s positions on 22 July 2015, 9:00 a.m. New YorkTime before hedging

Underlying Option Strike Position Cost of Position (USD)

Yahoo Call 31.5 35 24,675Yahoo Call 34.0 40 18,500Yahoo Call 35.0 25 9,125Yahoo Call 35.5 30 9,375Yahoo Call 36.0 45 11,340Yahoo Call 37.0 35 6,265Yahoo Call 38.0 40 3,620Yahoo Call 38.5 50 2,875Yahoo Call 39.0 -50 -1,525Yahoo Call 40.0 10 85Yahoo Call 40.5 -10 -60Yahoo Call 41.5 50 125Yahoo Call 42.0 -1,100 -3,300Yahoo Call 42.5 -50 -175Yahoo Call 43.0 -40 -60Yahoo Call 43.5 -40 -260Yahoo Call 44.5 -35 -595Yahoo Call 45.0 -45 -68Yahoo Put 31.5 -10 -170Yahoo Put 37.5 -1,050 -8,400Yahoo Put 38.0 6 78Yahoo Put 39.0 50 2,400Yahoo Put 39.5 49 4,312Yahoo Put 40.0 50 6,300Yahoo Put 41.5 50 13,575Yahoo Put 42.0 -50 -16,125Yahoo Put 42.5 -50 -18,625Yahoo Put 43.0 -50 -21,125Yahoo Put 45.0 50 30,750Yahoo Put 49.5 50 52,500Yahoo Put 50.0 50 55,125Yahoo Put 50.5 50 57,625Google Call 730.0 -100 -11,750Google Put 665.0 -100 -4,250

Total 222,163 IX .4 Trader’s positions in Yahoo Options on 22 July 2015, 9:00a.m. New York Time after hedging Underlying Strike CallPosition Cost of CallPosition(USD) PutPosition Cost of PutPosition(USD) Net Cost ofPosition(USD)

Yahoo 31.5 85 59,925 -60 -1,020 58,905Yahoo 34 90 41,625 -50 -100 41,525Yahoo 35 75 27,375 -50 -125 27,250Yahoo 35.5 80 25,000 -50 -150 24,850Yahoo 36 95 23,940 -50 -200 23,740Yahoo 36.5 -50 -11,525 -50 -225 -11,750Yahoo 37 85 15,215 -50 -300 14,915Yahoo 37.5 50 6,650 -1100 -8,800 -2,150Yahoo 38 90 8,145 -44 -572 7,573Yahoo 38.5 49 2,818 -50 -1,425 1,393Yahoo 39 -100 -3,050 100 4,800 1,750Yahoo 39.5 50 775 49 4,312 5,087Yahoo 40 60 510 100 12,600 13,110Yahoo 40.5 40 240 50 8,700 8,940Yahoo 41 50 200 50 10,975 11,175Yahoo 41.5 100 250 100 27,150 27,400Yahoo 42 -1150 -3,450 -100 -32,250 -35,700Yahoo 42.5 -100 -350 -100 -37,250 -37,600Yahoo 43 -90 -135 -100 -42,250 -42,385Yahoo 43.5 -90 -585 50 23,250 22,665Yahoo 44 -50 -125 -50 -26,375 -26,500Yahoo 44.5 -85 -1,445 50 28,375 26,930Yahoo 45 -95 -143 100 61,500 61,358Yahoo 46.5 -50 -50 50 38,500 38,450Yahoo 49.5 -50 -50 100 105,000 104,950Yahoo 50 -50 -50 100 110,250 110,200Yahoo 50.5 -50 -50 100 115,250 115,200

Total 591,280 X .5 Trader’s positions in Google Options on 22 July 2015, 9:00a.m. New York Time after hedging Underlying Strike CallPosition Cost of CallPosition(USD) PutPosition Cost of PutPosition(USD) Net Cost ofPosition(USD)

Google 510 -5 -97,100 -5 -15 -97,115Google 535 -5 -84,700 -5 -78 -84,778Google 545 5 79,475 -5 -90 79,385Google 550 5 76,975 -5 -15 76,960Google 560 -5 -72,100 -5 -28 -72,128Google 565 -5 -69,600 -5 -65 -69,665Google 570 -5 -67,100 -5 -65 -67,165Google 580 -5 -62,100 -5 -103 -62,203Google 590 -5 -57,100 -5 -90 -57,190Google 597.5 5 53,250 -5 -78 53,173Google 600 -5 -52,125 -5 -15 -52,140Google 615 5 44,475 -5 -78 44,398Google 620 5 41,975 -5 -78 41,898Google 630 5 37,000 -5 -78 36,923Google 650 -5 -27,175 5 75 -27,100Google 652.5 -5 -26,050 -5 -138 -26,188Google 655 -5 -24,750 -5 -138 -24,888Google 657.5 5 23,425 5 138 23,563Google 660 1 4,455 5 150 4,605Google 665 5 19,775 -95 -4,038 15,738Google 667.5 5 18,450 5 263 18,713Google 670 5 17,325 5 300 17,625Google 675 5 14,975 5 400 15,375Google 677.5 5 13,800 5 475 14,275Google 680 5 12,700 5 575 13,275Google 682.5 4 9,260 5 688 9,948Google 685 -5 -10,450 5 850 -9,600Google 687.5 5 9,325 -5 -1,038 8,288Google 690 -5 -8,400 5 1,300 -7,100Google 692.5 5 7,375 -5 -1,588 5,788Google 695 5 6,425 -5 -1,925 4,500Google 697.5 5 5,675 -5 -2,350 3,325Google 700 -5 -4,950 5 2,800 -2,150Google 702.5 -5 -4,225 5 3,375 -850Google 705 4 2,900 -4 -3,240 -340Google 710 5 2,525 -5 -5,475 -2,950Google 712.5 -5 -2,125 5 6,275 4,150Google 715 -5 -1,725 5 7,125 5,400Google 717.5 -5 -1,438 5 8,050 6,613Google 720 -3 -728 5 9,100 8,373Google 725 5 838 5 11,275 12,113Google 730 -105 -12,338 -5 -13,600 -25,938Google 735 -5 -388 5 15,675 15,288

Total -149,800 XI .6 Computation times of Scaled and (non-scaled) CVaR Normin ms Computation time in msComponent-wise Optimization α n ⟪ x ⟫ Sα (Deﬁnition 5.1) ⟪ x ⟫ α (Deﬁnition 5.2) ⟪ x ⟫ Sα (Proposition 5.1) ⟪ x ⟫ α (Proposition 5.2)0 2 0.62 0.50 220.44 197.763 0.11 0.03 211.03 179.1510 0.11 0.03 181.00 173.62100 0.12 0.03 196.84 194.831000 0.21 0.04 202.81 199.3810000 1.05 0.05 455.95 435.50100000 4.94 0.27 3766.36 3497.110.1 2 0.18 0.12 216.77 188.063 0.19 0.12 189.60 182.7110 0.12 0.08 199.62 186.78100 0.14 0.10 229.93 226.961000 0.19 0.14 244.86 236.0110000 1.00 0.94 625.06 599.35100000 5.25 5.03 6175.45 5843.760.25 2 0.20 0.12 181.25 175.683 0.18 0.12 181.29 184.3510 0.19 0.13 265.65 242.76100 0.14 0.10 214.34 217.701000 0.19 0.14 229.73 271.9410000 1.06 0.98 600.00 584.77100000 5.61 5.02 5772.24 5277.920.5 2 0.13 0.08 178.59 174.963 0.18 0.12 180.96 179.3410 0.13 0.08 184.33 181.49100 0.15 0.10 217.66 213.111000 0.19 0.14 323.36 239.7210000 1.00 0.92 571.45 551.93100000 5.64 5.00 5516.37 5128.190.7 2 0.05 0.04 179.90 176.373 0.05 0.03 184.08 188.0010 0.13 0.08 187.39 189.06100 0.14 0.10 250.00 267.461000 0.19 0.15 252.11 241.4610000 0.97 0.92 624.84 612.85100000 5.57 5.06 6201.42 5965.340.9 2 0.05 0.04 177.20 178.023 0.05 0.04 182.70 183.6010 0.12 0.08 177.95 180.54100 0.14 0.10 231.81 231.111000 0.19 0.14 289.31 249.2210000 0.98 0.91 749.50 713.43100000 5.26 5.02 8122.91 7767.68 XII .7 Ratio of Projections of Random Hyperplanes onto C α UnitBall in R over 5,000 Trials Projected onto Ratiox = [ , , , ] T x = [ , , , ] T x = [ , , , ] T x = [ , , , ] T x = [− , , , ] T x = [ , − , , ] T x = [ , , − , ] T x = [ , , , − ] T x = ( / ) × [ , , , ] T x = ( / ) × [ , , , − ] T x = ( / ) × [ , , − , ] T x = ( / ) × [ , , − , − ] T x = ( / ) × [ , − , , ] T x = ( / ) × [ , − , , − ] T x = ( / ) × [ , − , − , ] T x = ( / ) × [ , − , − , − ] T x = ( / ) × [− , , , ] T x = ( / ) × [− , , , − ] T x = ( / ) × [− , , − , ] T x = ( / ) × [− , , − , − ] T x = ( / ) × [− , − , , ] T x = ( / ) × [− , − , , − ] T x = ( / ) × [− , − , − , ] T x = ( / ) × [− , − , − , − ] T x ppendix C Extended Diagrams

C.1 Monte Carlo simulated loss distributions of single assets(Scenario 2 of Section 3.3)

XIV .2 Monte Carlo simulated loss distributions of optimal port-folios (Scenario 2 of Section 3.3) XV .3 C α and L p ∗ norm surface plots of x ∈ R n for diﬀerent α and p ∗ XVIVII .4 Projection of a circle onto the unit ball in R using L and C α normsnorms