A Novel Approach to Quantification of Model Risk for Practitioners
Zuzana Krajcovicova, Pedro Pablo Perez-Velasco, Carlos Vazquez
aa r X i v : . [ q -f i n . R M ] M a y A Novel Approach to Quantification of Model Risk for Practitioners
Z. Krajˇcoviˇcová* a , P. P. Pérez Velasco b , C. Vázquez c a Departmento of Mathematics, University of A Coruña, Spain b Model Risk Division, Banco Santander, Spain c Departmento of Mathematics, University of A Coruña, Spain
Abstract
Models continue to increase their already broad use across industry as well as their sophistication. Worldwide regula-tion oblige financial institutions to manage and address model risk with the same severity as any other type of risk, e.g.[8], which besides defines model risk as the potential for adverse consequences from decisions based on incorrect andmisused model outputs and reports. Model risk quantification is essential not only in meeting these requirements butfor institution’s basic internal operative. It is however a complex task as any comprehensive quantification methodol-ogy should at least consider the data used for building the model, its mathematical foundations, the IT infrastructure,overall performance and (most importantly) usage. Besides, the current amount of models and different mathematicalmodelling techniques is overwhelming.Our proposal is to define quantification of model risk as a calculation of the norm of some appropriate function thatbelongs to a Banach space, defined over a weighted Riemannian manifold endowed with the Fisher–Rao metric. Theaim of the present contribution is twofold: Introduce a sufficiently general and sound mathematical framework to coverthe aforementioned points and illustrate how a practitioner may identify the relevant abstract concepts and put them towork.
Keywords: model risk, uncertainty, Riemannian manifold, geodesics, exponential map, Fisher–Rao information metric
1. Introduction
Models are simplifying mappings of reality to serve a specific purpose aimed at applying mathematical, financialand economic theories to the available data. They deliberately focus on specific aspects of the reality and degrade orignore the rest. Understanding the capabilities and limitations of the underlying assumptions is key when dealing with amodel and its outputs. According to the [8] model risk is defined as "[. . . ] the potential for adverse consequences from decisions based on incorrect or misused model out-puts and reports. Model risk can lead to financial loss, poor business and strategic decision making, ordamage to bank’s reputation"
Fed then identifies the two main reasons for model risk (inappropriate usage and fundamental errors). Further, theystate that model risk should be managed and addressed with the same severity as any other type of risk and that banksshould identify the sources of model risk and assess their magnitude. Fed also emphasizes that expert modelling, robustmodel validation and a properly justified approach are necessary elements in model risk moderation, though they are notsufficient and should not be used as an excuse for not improving models.In spite of the rise of awareness of model risk and understanding its significant impact, there are no globally definedindustry and market standards on its exact definition, management and quantification, even though a proper model riskmanagement is required by regulators.Within the finance literature, some authors have defined model risk as the uncertainty about the risk factor distribu-tion ([10]), the misspecified underlying model ( [7]), the deviation of a model from a ’true’ dynamic process ([4]), the ∗ Corresponding Author
Email address: [email protected] (Z. Krajˇcoviˇcová*)
Preprint submitted to Elsevier iscrepancy relative to a benchmark model ( [12]), and the inaccuracy in risk forecasting that arises from the estimationerror and the use of an incorrect model ([3]). Model risk has been classified previously in all asset classes, see [16]for interest rate products and credit products, [6] for portfolio applications, [19] for asset backed securities, and [3] forrelation to measuring marker risk.The quantification, as an essential part of model risk management, is required for a consistent management andeffective communication of model weaknesses and limitations to decision makers and users and to assess model risk inthe context of the overall position of the organization. The quantification of model risk should consider the uncertaintystemming from the selection of the mathematical techniques (e.g. focusing on fitting a normal distribution hence leav-ing aside other distribution families), the calibration methodology (e.g. different optimization algorithms may derivedifferent parameter values), and from the limitations on the sample data (e.g. sparse or incomplete database).Model risk quantification poses many challenges that come from the high diversity of models, the wide range oftechniques, the different use of models, among others. Some model outputs drive decisions; other model outputs pro-vide one source of management information, some outputs are further used as an inputs in other models. Additionally,the model outputs may be completely overridden by expert judgement. Not to mention that in order to quantify modelrisk you need another model, which is again prone to model risk.The most relevant areas of analysis for the quantification of model risk are: data and calibration, model foundations,model performance, IT infrastructure, model use, controls and governance, and model sensitivity. The model may befundamentally wrong due to the errors in theoretical foundation and conceptual design that emerge from incorrect logicor assumptions, model misspecification or omission of variables. Data quality issues, inadequate sample sizes and out-dated data contribute to model performance issues such as instability, inaccuracy or bias in model forecasts. Model riskalso arises from inadequate controls over the model use. Flawed test procedures or failure to perform consistent andcomprehensive user acceptance tests can lead to material model risk. To name just a few.The focus of this paper is on developing a novel approach for quantifying model risk within the framework of differ-ential geometry ([17]) and information theory ([1]). In this work we introduce a measure of model risk on a statisticalmanifold where models are represented by a probability distribution function. Differences between models are deter-mined by the geodesic distance under the Fisher–Rao metric. This metric allows us to utilize the intrinsic structure of themanifold of densities and to respect the geometry of the space we are working on, i.e. it accounts for the non–linearitiesof the underlying space.The rest of this paper is structured as follows. In section 2, we summarize basic facts about Riemannian geometryand introduce the terminology used throughout the paper. Modeling process steps and a general description of ourproposed method for quantification of model risk are presented in Section 3, which is followed by a detailed discussionon the main quantification steps. Section 4 to 6 describe the construction of the neighbourhood containing materialvariations of the model, and the definition and construction of the weight function. The model risk measure is thendefined and explained in Section 7. Section 8 provides some final conclusions and directions for future work, andfinally, the Appendix contains the proofs of the main results.
2. Background on Riemannian Geometry
This section introduces the necessary notation for the rest of the paper. The details can be found among other stan-dard references in [1] or [17]. M is a compact and connected manifold without boundary equipped with a Riemannian metric < · , · > and aRiemannian connection ▽ , with T p M the tangent space at p ∈ M . The distance d between p, q ∈ M is given by d ( p, q ) := inf γ Z ba || γ ′ ( t ) || dt, where γ ranges over all differentiable paths ω : [ a, b ] → M satisfying γ ( a ) = p and γ ( b ) = q , and || γ ′ || = (cid:10) γ ′ , γ ′ (cid:11) . ( M , d ) is a metric space. 2he Riemannian metric associates to each point p ∈ M an inner product < · , · > p on T p M . One natural metric onthe Riemannian manifold M is the Fisher–Rao information metric ([18]) I ij ( p ) = g ij ( p ) = E h ∂ log( p ) ∂x i ∂ log( p ) ∂x j i = Z p ∂ log( p ) ∂x i ∂ log( p ) ∂x j dx (1)The det I ( p ) represents the amount of information a sample point conveys with respect to the problem of estimating theparameter x , and so I ( p ) can be used to determine the dissimilarities between distributions.Under a square–root representation, the Fisher–Rao metric becomes the standard L metric and the space of prob-ability density functions becomes the positive orthant of the unit hypersphere in L ([14]). The square–root mappingis defined as a continuous mapping φ : M → Ψ where Ψ is the space containing the positive square–root of all pos-sible density functions. Using this mapping, we define the square–root transform of probability density functions as φ ( p ) = ψ = √ p : Ψ = n ψ ( x ) (cid:12)(cid:12)(cid:12) Z X | ψ ( s ) | ds = 1 , ∀ s ψ ( s ) ≥ , x ∈ X o , where X is the sample space. In this case, the associated natural Hilbert space, H , equipped with a symmetric innerproduct, g ij , induces a spherical geometry, i.e. the sum P ( √ p ) is equal to unity ([14]). If the density function isparametrized by the set of parameters θ = ( θ , . . . , θ n ) then for each value of θ i we have a corresponding point on theunit sphere S in H . In this setting the geodesics are available in closed form and can hence be computed quickly andexactly. For any two tangent vectors v , v ∈ T ψ Ψ , the Fisher–Rao metric is given by (cid:10) v , v (cid:11) = Z R v ( s ) v ( s ) ds = * ∂ { p ( · , θ ) } / ∂θ i , ∂ { p ( · , θ ) } / ∂θ j + = 14 g ij (2)The geodesic in the direction v on the sphere and the distance given two points ψ , ψ belonging to the sphere are givenby γ ( t ) = cos( t || v || ) ψ + sin( t || v || ) v || v || d ( ψ , ψ ) = cos − ( (cid:10) ψ , ψ (cid:11) ) Since the compactness of M implies geodesic completeness ([5]), there exists for every p ∈ M and v ∈ T p M anunique geodesic γ : R → M satisfying γ (0) = p and γ ′ (0) = v . Moreover, the Hopf–Rinow Theorem ([11]) ensuresthat any two points p, q ∈ M can be joined by a minimal geodesic of length equal to the distance between the points, d ( p, q ) . Through the geodesic γ , one can define the exponential map exp p : T p M → M by exp p tv := γ ( t ) , ∀ t ∈ R , ∀ v ∈ M . The exponential map for square–root transformation ([13]) has the form exp ψ i tv := cos (cid:16) || tv || ψ i (cid:17) ψ i + sin (cid:16) || tv || ψ i (cid:17) tv || tv || ψ i , where v ∈ T ψ i (Ψ) . The inverse exponential map from ψ i to ψ j is given by exp − ψ i ( ψ j ) := d ( ψ , ψ )sin( d ( ψ , ψ )) (cid:16) ψ j − cos( d ( ψ , ψ )) ψ i (cid:17) . (3)An open set U ⊂ M is said to be a normal neighbourhood of p ∈ U , if exp p is a diffeomorphism on a neighbour-hood V of the origin of T p M onto U , with V such that tv ∈ V for ≤ t ≤ , if v ∈ V .
3. Modeling Process Steps and Quantification of Model Risk
There are different types and aspects of model risk that tend to easily overlap, co–occur, or co–vary. In this context,we propose four rough model creation steps: Data, Calibration, Model Selection and Testing, and Implementation andUsage. This may occur in an iterative fashion, but they result in a general linear flow that ends with institutional use(implementation and maintenance) to direct decision making (often encoded into an IT system). Limitations in any ofthese areas can impair reliance on model results. 3.
Data refers to the definition of the purpose for modeling, the specification of the modeling scope, human andfinancial resources, the specification of data and other prior knowledge, their interpretation and preparation. Thedata may be obtained from both internal and external sources, and they are further prepared by cleaning and re-shaping. Model risk may arise from data deficiencies in terms of both quality and availability, including, amongothers, error in data definition, insufficient sample, inaccurate proxies, sensitivity to expert judgments, or misin-terpretation.2.
Calibration includes the selection of the types of variables and the nature of their treatment, the tuning of freeparameters, and links between system components and processes. Estimation uncertainty may occur due to sim-plifications, approximations, flawed assumptions, inappropriate calibration, wrong selection of subset, errors instatistical estimation or in market benchmarks, computational or algorithmic limitations, or use of unobservableparameters.3.
Model Selection and Testing involves the choice of the estimation performance criteria and techniques, theidentification of model structure and parameters, which is generally an iterative process with the underlying aim tobalance sensitivity to system variables against complexity of representation. Further, it is related to the conditionalverification which includes testing the sensitivity to changes in the data and to possible deviations from the initialassumptions. In this step, model risk stems from, e.g., inadequate and incorrect modeling assumptions, outdatedmodel due to parameter decalibration, model instability, or model misspecification.4.
Implementation and Usage refers to the deployment of the model into production which is followed by a regularmaintenance and monitoring. Sources of model risk in this step include using the model for unintended purposes,luck of recalibration, IT failures, luck of communication between modelers and users, luck of understanding onmodel limitations.Quantification of model risk, from a best practice perspective, should be quick and reliable, without refitting orbuilding models, without reference to particular structure and methodologies, and with prioritizing analysis (gettingimmediate assurance on shifts that are immaterial). Differential geometry and information theory offer a base for suchan approach. In this framework, a model is represented by a particular probability distribution, p : X → R + that be-longs to the set of probability measures M , so called statistical manifold, available for modelling. The manifold M canbe further equipped with the information–theoretic geometric structure that, among other things, allows us to quantifyvariations and dissimilarities between probability distribution functions (models).The set of possible probability measures may be further parametrized in a canonical way by a parameter space Θ , M = { p ( x ; θ ) : θ ∈ Θ } . This set forms a smooth Riemannian manifold M . Every distribution is a point in this space,and the collection of points created by varying the parameters of the model, p ∈ M , gives rise to a hypersurface (aparametric family of distributions) in which similar distributions are mapped to nearby points. The natural Riemannianmetric is shown to be the Fisher–Rao metric ([18]) which is the unique intrinsic metric on the statistical manifold. It isthe only metric that is invariant under re–parametrization, [1].Let us consider a given model p which can be uniquely parametrized using the vector θ = ( θ , . . . , θ n ) over thesample space X and which can be described by the probability distribution p = p ( x ; θ ) . This probability distributionbelongs to a set (family) of distributions M = { p ( x ; θ ) : θ ∈ Θ ⊂ R n } that forms a model manifold. We assume thatfor each x ∈ X the function θ p ( x ; θ ) is C ∞ . Thus, M forms a differentiable manifold and we can identify modelsin the family with points on this manifold. Choosing a particular model is the same as fixing a parameter setting θ ∈ Θ . Example
To help fix ideas, we introduce an illustrative simple example and develop it further throughout the paper. Let X denote a vector of profit and loss, P & L , over a two year time horizon (520 days) that is used to calculate the Value atRisk (VaR). VaR is derived from a distribution of P & L as the quantile loss at the portfolio level and is defined by P ( X ≤ V aR ) = 1 − β, where β is the confidence level set to . . Assume that the given model considers X to be normally distributed p = N ( µ , σ ) with parameters µ = 2 , σ = 10 once calibrated. This model belongs to a family of normal distribu-tions that forms a differentiable manifold M = { p ( x ; µ, σ ) : µ ∈ R , σ > } where µ is the mean and σ is the standarddeviation. Every point p ∈ M corresponds to a normal distribution p ( x, θ ) with θ = ( µ, σ ) .4n our univariate normally distributed case parametrized by a 2–dimensional space, θ = ( µ, σ ) , the Riemannianmatrix defined by is given by I = [ I ij ( µ, σ )] = σ
00 2 σ = (cid:20) .
01 00 0 . (cid:21) . (cid:3) We define the model risk for a given model p at the scale of an open neighbourhood around p that containsalternative models that are not too far in a sense quantified by the relevance to (missing) properties and limitations of themodel. The model risk is then measured with respect to all models inside this neighbourhood as a norm of an appropriatefunction of the output differences over a weighted Riemannian manifold endowed with the Fisher–Rao metric and theLevi–Civita connection . The analysis consists of five steps:1. Embedding the model manifold into one that considers missing properties in the given model p .2. Choosing a proper neighbourhood around the given model.3. Choosing an appropriate weight function, that assigns relative relevance to the different models inside the neigh-bourhood.4. Calculating the measure of model risk with respect to all models inside the neighbourhood, through the corre-sponding norm.5. Interpretation of the measure with respect to the specific use of the model risk quantification.Each step addresses and aligns different limitations of the model and the uncertainty in various areas related to themodel . In the following sections we further develop these steps and describe the intuition behind.
4. Neighbourhood Around the Model
Recall that the given model p belongs to a n –dimensional manifold M where each dimension represents differentpieces of information inherited in p . To consider missing properties, the uncertainty surrounding the data and the cal-ibration, the additional information about the limitations of the model, or wrong underlying assumptions, we may needto adjoin new dimensions to M , and thus, consider a higher–dimensional space within which M is embedded.The proper neighbourhood around p we define with the help of the tangent space T p M at a point p . T p M is avector space that describes a first order approximation, infinitesimal displacements or deformations on the manifold inthe position of the point p .From a practical point of view, not all perturbations are relevant, thus taking into account the materiality with respectto the intended purpose of the model, its usage, business and market, we consider only a small subset of the tangentbundle.Let U be the open set around p of some normal neighbourhood V such that U := { tv ∈ V ⊂ T p M : 0 < t ≤ α ( v ) , v ∈ S ( p , and normal coordinates are defined } , where S ( p ,
1) = { v ∈ T p M , || v || = 1 } is a unit sphere on T p M .The neighbourhood U includes the directions of all relevant perturbations of the model p up to a certain level α ( v ) .The level α ( v ) depends on the tangent vectors, since the degree of our uncertainty on p might not be constant acrossthe canonical parameter space; for instance we could assume more uncertainty in the tails of the distribution p thanin its body. We can interpret α ( v ) as a means to control uncertainty regarding the choice of the model p , and it isappropriately chosen based on the usage of the model. The level α ( v ) may also depend on the uncertainty surrounding The Levi–Civita connection parallely transports tangent vectors defined at one point to another and is compatible with the geometry induced bythe Riemannian metric ([1]). Additionally, for this choice of connection, the shortest paths are geodesics. Or properties not appropriately modelled, for which there is no consensus, cannot be adequately calibrated, among many others. Such as data, calibration, model selection, model performance, model sensitivity and scenario analysis, and most importantly the usage of themodel U is a subset of the normal neighbourhood around p , the exponential map is well defined and we can constructa corresponding set of models close enough to p : U := exp p ( U ) = { p ∈ M : d ( p , p ) ≤ α ( v ) } , From now on, we shall require the boundary ∂U = { α ( v ) v | v ∈ S ( p , } to be continuous and piecewise regular.Moreover, U shall be a star–shaped set with respect to p that is defined as follows: Definition 1.
A compact subset U of a Riemannian manifold M is called star–shaped with respect to p ∈ U if ∀ p ∈ U, p = p there exists a minimizing geodesic γ with γ (0) = p and γ ( T p ) = p such that γ ( t ) ∈ U for all t ∈ [0 , T p ] , where T p > . One advantage of the exponential map in this setting is that we can avoid calibration of different alternative modelsinside U . For each unit vector v ∈ U there exists a unique geodesic connecting points on the boundary of U with thepoint p . This geodesic is given by γ ( t ) = exp p ( tv ) for t ∈ [0 , α ( v )] . Example
Demonstrating that the model is suitable for the intended purpose is a critical part of the analysis of model risk. Wewant to evaluate the impact of relaxing the assumption of symmetry for the underlying P & L distribution, i.e. the impactof not including the skew in the model. Hence, we embed the model manifold M into a larger manifold of skew–normaldistributions, ¯ M = { p ( x ; µ, σ, s ) : µ ∈ R , σ > , s ∈ R } , where s is the shape parameter ([2]). Note that for s = 0 were–obtain the initial normal distribution, N ( µ, σ ) . The skew normal distribution family takes into account the skewnessproperty.After considering various time windows, data sequences, fittings and estimates, we determine the neighbourhoodof our model to be the geodesic connecting the base model, p = N (2 ,
10) = SN (2 , , , and the skew–normaldistribution, p = SN ( µ, σ, s ) , with parameters µ = 1 . , σ = 9 . , and s = 2 . The geodesic distance between thesetwo distributions is d ( √ p , √ p ) = 0 . . To form the neighbourhood, we first construct the related perturbationtangent vector associated with the directions to the boundary point, √ p , using the inverse exponential map defined by3 v p = exp − √ p (cid:0) √ p (cid:1) = h . . (cid:16) √ p − cos (cid:0) . (cid:1) √ p (cid:17)i This provides a class of variations of the initial model by moving away from it in the direction v p which determines thewhole neighbourhood U given by U = { γ ( t ) = (exp √ p ( tv p )) ; t ∈ [0 , } with boundaries ∂ U = { p } and ∂ U = { p } Thus, by varying t from to , one traces the geodesic path from p to p , and we obtain a set of all distributions in the direction p . The neighbourhood U around p includes all distributionson the geodesic γ for t ∈ [0 , . (cid:3)
5. Weight Function Definition
Variations of the chosen model are not equally material and they all might take place with different probabilities.By placing a non–linear weight function (kernel), K , over the set U we can easily place relative relevance to each al-ternative model, and assign the credibility of the underlying assumptions that would make alternative models partiallyor relatively preferable to the nominal one p . The particular choice of the structure of the kernel depends on variousfactors, such as usage of the model, distance from p , or sensitivity to different changes.In what follows we define a general weight function K and show that under certain conditions it is well definedand unique. In general, we consider K to be a non–negative and continuous function that depends on the localgeometry of M by incorporating a Riemannian volume associated to the Fisher–Rao information metric given by6 v ( p ) = p det ( I ( θ )) dθ . The volume measure is the unique Borel measure on M ([9]). With respect to a coordi-nate system, the information density of p represents the amount of information the single model possesses with respectto the parameters. For example, a small dv ( p ) means that the model contains much uncertainty and requires manyobservations to learn.As the underlying factors that influence the perturbations of the given model happen with some likelihood, we treatall models inside M as random objects. As a consequence, we require K to be a probability density with respect to theRiemannian volume, i.e. R M Kdv ( p ) = 1 . Additionally, we state that the right model does not exist and that the choiceof p was to some extent a subjective preference. Definition 2.
An admissible weight function K defined on M satisfies the following properties: ( K ′ ) K is continuous on M ( K ′ ) K ≥ for all p ∈ M ( K ) R M Kdv ( p ) = 1 Recall that to compute the n –dimensional volume of the objects in M , one considers a metric tensor on the tangentspace T p M at p ∈ M . In particular, the Fisher–Rao information metric I on M maps each p ∈ M to a volume dv ( p ) which is a symmetric and bilinear form that further defines an n –dimensional volume measure on any measurable subset U ⊂ M by Vol ( U ) := R U dv ( p ) . A smooth probability density K over M with respect to the Riemannian measureinduces a new absolutely continuous probability measure ζ with respect to Volζ ( U ) = Z U dζ = Z U Kdv ( p ) (4)for all measurable U ⊂ M and ζ ( M ) = 1 . The pair ( M , ζ ) is then called a weighted manifold, or a Riemannianmetric–measure space and is proved to be a nontrivial generalization of Riemannian manifolds ([15]).The weight function K of the Definition 2 represents a general characterization of a probability density over theRiemannian manifold M . To tune K for proper analysis of model risk, we need to impose additional properties whichare connected with the specific uncertainties surrounding the given model.From a practitioner point of view, models that do not belong to the chosen neighbourhood U are not relevant fromthe perspective of model risk, and so do not add any uncertainty. Therefore, we assume the weight function to be non–negative only over the neighbourhood U and zero elsewhere. Moreover, translation of the changes in various underlyingassumptions, data or calibration into the changes in output and further usage of the model are going to vary with respectto the direction of the change. Hence, we require K to be continuous along the geodesic curves γ uniquely determinedby v ∈ S ( p , ⊂ T p U starting at p and ending at the points on ∂U . These additional properties are a modificationof ( K ′ ) and ( K ′ ) : ( K ) K is continuous along all geodesics γ starting at p for all unit vectors on S ( p , ( K ) K > ∀ p ∈ U \{ ∂U } and K ≥ ∀ p ∈ ∂U , and K = 0 ∀ p ∈ M\{ U } The weight function satisfying properties ( K − ( K takes into consideration and is adjusted according to thedifferent directions of the changes, i.e. prescribes different sensitivities to different underlying factors.
6. Weight Function Construction
The construction of a weight function on a given Riemannian manifold is technically difficult since it requires preciseknowledge of the intrinsic geometry and the structure of the manifold. To determine a weight function K that satisfies allof the required properties and in order to overcome this difficulty we introduce a continuous mapping from a manifoldendowed with an Euclidean geometry to the model manifold endowed with a Riemannian geometry that preserves thelocal properties. Euclidean geometry is well understood and intuitive, and thus a construction of a function on this spaceis considerably easier and more intuitive. In total, we construct three mappings: the exponential map exp p , the polar For example the uncertainty surrounding data, calibration or model selection. P and a further coordinate transform Λ ρ .Every Riemannian manifold M is locally diffeomorphic to the Euclidean space R n , and so in a small neighbour-hood of any point the geometry of M is approximately Euclidean. All inner product spaces of the same dimensionare isometric, therefore, all the tangent spaces T p M on a Riemannian manifold M are isometric to the n –dimensionalEuclidean space R n endowed with its canonical inner product. Hence, all Riemannian manifolds have the same in-finitesimal structure not only as manifolds but also as Riemannian manifolds.The weight function is defined with respect to the neighbourhood U and is continuous on the geodesic curves γ connecting p to the points on the boundary ∂U . All material perturbations, i.e. alternative models inside U , areuniquely described by the distances from p and by the vectors tangent to the unique geodesics γ that pass throughthem. To maintain these properties, we consider an n –dimensional cylinder C n = [0 , × S n − = { ( t, ν ) : t ∈ [0 , , ν ∈ S n − } ⊂ R n +1 , where the parameter t stands for the normalized distance of geodesics, and where S n − denotes the ( n − –dimensional unit sphere on R n containing all the unit tangent vectors of T p M . The boundaries of C n are ∂ C n = { (0 , ν ) : ν ∈ S n − } , ∂ C n = { (1 , ν ) : ν ∈ S n − } , and represent the end points of the geodesics, i.e. ∂ C n will be transformed into p and ∂ C n into ∂U .The Riemannian structure on C n is given by the restriction of the Euclidean metric in R n +1 to C n . Hence, C n is acompact smooth Riemannian manifold with a canonical measure given by the product measure dt × dν . This manifoldallows us to construct an appropriate function on C n , and then obtain a weight function satisfying all required properties ( K − ( K .As a first step to obtain a mapping from C n to M , we consider the exponential map from the tangent space at thepoint p onto the neighbourhood U . Since U is compact and, hence, topologically complete, the geodesic γ can bedefined on the whole real line R ([11]). Thus, the exponential map is well–defined on the whole tangent space T p M .Further, since U is a subset of the normal neighbourhood of p , the exponential map defines a local diffeomorphismfrom T p U to U . Then the geodesics γ are given in these coordinates by rays emanating from the origin. Example
The weight function is constructed with respect to the neighbourhood U that in our example represents the geodesic γ with boundary points p and p . We parametrize γ by t ∈ [0 , , and define the one–dimensional cylinder as C =[0 , × S with boundaries ∂ C = { (0 , ν ) } and ∂ C = { (1 , ν ) } where S = { ν ∈ R n : || ν || = ν = 1 } . The leftboundary ∂ C will be contracted to the given model p and ∂ C to p .The construction of the weight function reduces to the construction of the weight function on the real line on theinterval [0 , . (cid:3) Next, we introduce a polar coordinate transformation on T p M . For the sake of distinction, we denote by S ( p , the ( n − –dimensional unit sphere in T p M , and by S n − the unit sphere in R n : P : [0 , ∞ ) × S ( p , → T p M : P ( t, v ) = tv ; P − : T p M\{ } → (0 , ∞ ) × S ( p ,
1) : P − ( v ) = (cid:16) || v || , v || v || (cid:17) (5)In order to precisely describe the neighbourhood U , we define ρ ( v ) as the length d ( p , p ) ≥ of the geodesic γ connecting p with the boundary p ∈ ∂U in the direction v ∈ S ( p , . The distance ρ , considered as a realvalued function on S ( p , , is strictly positive and Lipschitz continuous on S ( p , . We now define the coordinatetransformation Λ ρ : ( t, v ) (cid:16) ρ ( v ) t, v (cid:17) . The mapping S n − → S ( p , , ν v is well defined in the sense that there exists a canonical identification betweena unit vector ν ∈ S n − ⊂ R n and the element v ∈ S ( p , ⊂ T p M . Since the distance v ρ ( v ) is strictly positiveand Lipschitz continuous on S ( p , , so is the inverse v ρ ( v ) . Therefore, the mapping Λ ρ defines a bi–Lipschitz8apping from [0 , × S n − onto the subset [0 , ρ ( v )] × S ( p , .Therefore the composition exp p P Λ ρ defines a mapping from C n onto U that maps ∂ C n onto the point { p } andthe right hand side boundary onto ∂U . Moreover, it preserves continuity for any continuous function h defined on C n that satisfies the following consistency condition: Definition 3.
A continuous function h defined on a cylinder C n is called consistent with a continuous function f on U under the mapping exp p P Λ ρ if h ( t, ν ) = Λ − ρ f − ( t, ν ) for all ( t, ν ) ∈ C n . In this case, h satisfies the followingconditions: ( i ) h (0 , ν ) = h (0 , ν ) ∀ ν , ν ∈ S n − ( ii ) h (1 , ν ) = h (1 , ν ) if exp p P Λ ρ (1 , ν ) = exp p P Λ ρ (1 , ν ) on M The first condition ( i ) implies that h is constant on the boundary ∂ C n . When the function h on C n is consistent with f , the constant value at ∂ C n corresponds exactly with the value f ( p ) . The second condition ensures compatibilityof function h with function f at the points of the boundary ∂U , i.e. if exp p P Λ ρ maps two different points (1 , ν ) and (1 , ν ) in C n onto the same point p ∈ ∂U , then h (1 , ν ) = h (1 , ν ) = f ( p ) . Lemma 1.
The existence of the weight function K satisfying assumptions ( K − ( K is equivalent to assuming theexistence of a consistent function h ( t, ν ) defined on C n with codomain R n satisfies the following properties: (H1) h ( t, ν ) is a continuous function on the compact manifold C n (H2) h ( t, ν ) ≥ , ( t, ν ) ∈ [0 , × S n − (H3) h (1 , ν ) = κ ( ν ) for all ν ∈ S n − , where κ is some non–negative function of ν (H4) h (0 , ν ) = h (0 , ν ) = const. for all ν , ν ∈ S n − (H5) R C n h ( t, ν ) dν = 1 , where dν = dt × dµ See the Appendix for a proof. Using this result, the construction of the weight function becomes easier and moreintuitive. One chooses the appropriate function h defined on C n with respect to the particular model and the uncertaintysurrounding it. Then, applying the above transformation one obtains an appropriate weight function K defined on U satisfying properties ( K − ( K relevant for model risk analysis. Besides, for a chosen function h the weight function K is unique and well defined. Theorem 4.
A continuous function h defined on C n satisfying conditions ( H − ( H , determines a unique and welldefined weight function K satisfying ( K − ( K on U given by K ( p, t ) = 1 η p ( p ) t − n ρ ( v ) − n h tρ ( v ) , v ! (6) where η p ( p ) is the volume density with respect to p , v is the tangent vector, t ∈ [0 , ρ ( v )] is a scaling parameter, and ρ ( v ) is the distance function defined above.Example In line with our example, we construct a suitable weight function adjusted to the uncertainty surrounding the VaRmodel. We have seen in the previous section that the underlying process suggests small deviations from the normaldistribution and indicate a negative skew. Thus, to determine the weight function we construct a continuous function h that has the maximum value at the point representing our given model p , and is monotonically–decreasing with thedistance from p . This choice means that we are interested more on how sensitive is the model to small variations around p . We define h on [0 , as follows h ( t ) = c (cid:16) − t (cid:17) , t ∈ [0 , c ensures the assumptions ( H and equals to Γ (cid:18) n (cid:19) π − n/ . Note that since we haveonly one tangent vector ν , h depends only on the parameter t . By applying the continuous mapping exp p P Λ ρ weobtain the weight function K along the geodesics γ : K ( p, t ) = 1 η p ( p ) d ( p , p ) − Γ (cid:18) (cid:19) π − / − td ( p , p )) ! = 1 . (cid:16) − . t (cid:17) (cid:3)
7. Measure of Model Risk
In this section we shall introduce a mathematical definition of the quantification of model risk, relate it to the con-cepts introduced so far and study some actual applications.Recall that we have so far focused on a weighted Riemannian manifold ( M , I, ζ ) with I the Fisher–Rao metric and ζ as in eq. 4. The model in previous sections was assumed to be some distribution p ∈ M . More likely, a practitionerwould define the model as some mapping f : M → R with p f ( p ) , i.e. a model outputs some quantity. We shall formally introduce the normed space ( F , k·k ) such that f ∈ F . Though not strictly necessary at thisinformal stage we shall assume completeness so ( F , k·k ) is a Banach space. Definition 5.
With notation as above, let ( F , k·k ) be a Banach space of measurable functions with respect to ζ . The model risk Z of f ∈ F and p is given by Z ( f, p ) = k f − f ( p ) k . (7)Note that the measure represents the standard distance. All outcomes are constrained by the assumptions used in themodel itself and so, the model risk is related to the changes in the output while relaxing them. The relevant model riskis therefore the difference between two models, rather than a hypothetical difference between a model and the truth ofthe matter.The quantification of model risk itself can be thought of as a model with a purpose such as provisions calculation orcomparison of modelling approaches. Possibilities are endless so we might have started with some T : F → F and set Z ( f, p ) = k T ◦ f k ; however, we think eq. 7 is general enough for our present purposes.In what follows we address four examples of Def. 5. Their suitability very much depends among other factors onthe purpose of the quantification, as we shall see below.1. Z ( f, p ) for f ∈ L ( M ) represents the total relative change in the outputs across all relevant models: Z ( f, p ) = k f − f ( p ) k = Z M (cid:12)(cid:12)(cid:12) f − f ( p ) (cid:12)(cid:12)(cid:12) dζ Z ( f, p ) for f ∈ L ( M ) puts more importance on big changes in the outputs (big gets bigger and small smaller).It would allow to keep consistency with some calibration processes such as the maximum likelihood or least squarealgorithms: Z ( f, p ) = k f − f ( p ) k = (cid:16) Z M (cid:16) f − f ( p ) (cid:17) dζ (cid:17) / The volume of the ( n − –dimensional ball S (0 , is π / \ Γ (cid:18) (cid:19) . Thus we have c Z [0 , × S n − (cid:16) − t (cid:17) dt × dµ ⇒ c = Γ (cid:16) n (cid:17) π − π/ This is not always the case but we can proceed along these lines depending on the usage to be given to the quantification itself. For example, aninter(extra)polation methodology on a volatility surface is a model whose output is another volatility surface, not a number. If we want to quantify themodel risk of that particular approach for Bermudans we might consider its impact on their pricing. For example, another possibility is to use (cid:13)(cid:13)(cid:13) f ) f ( p ) (cid:13)(cid:13)(cid:13) or (cid:13)(cid:13)(cid:13) f − f ( p ) f ( p ) (cid:13)(cid:13)(cid:13) . These functional forms would allow us to obtain a dimensionless numberwhich is might be a desirable property. Z ∞ ( f, p ) for f ∈ L ∞ ( M ) finds the relative worst–case error with respecto to p : Z ∞ ( f, p ) = k f − f ( p ) k ∞ = ess sup M (cid:12)(cid:12)(cid:12) f − f ( p ) (cid:12)(cid:12)(cid:12) Further, it can point to the sources of the largest deviances: Using exp − p we can detect the corresponding directionand size of the change in the underlying assumptions.4. Z s,p ( f, p ) for f ∈ W s,p ( M ) is a Sobolev norm that can be of interest in those cases when not only f is relevantbut its rate of change: Z s,p ( f, p ) = k f − f ( p ) k s,p = X | k |≤ s Z M (cid:12)(cid:12)(cid:12) ∂ k (cid:16) f − f ( p ) (cid:17)(cid:12)(cid:12)(cid:12) p dζ ! /p Sound methodology for model risk quantification should at least consider the data used for building the model, themodel foundation, the IT infrastructure, overall performance, model sensitivity, scenario analysis and, most importantly,usage. Within our framework we address and measure the uncertainty associated with the aforementioned areas and theinformation contained in the models. The choice of the embedding and proper neighbourhood of the given model takeinto account the knowledge and the uncertainty of the underlying assumptions, the data and the model foundation. Theweight function that assigns relative relevance to the different models inside the neighbourhood considers the modelsensitivity, scenario analysis, the importance of the outcomes with connection to decision making, the business, theintended purpose, and it addresses the uncertainty surrounding the model foundation. Besides, every particular choiceof the norm provides different information of the model. Last and most important, the model risk measure considers theusage of the model represented by the mapping f .
8. Conclusions and Further Research
In this paper we introduce a general framework for the quantification of model risk using differential geometry andinformation theory. We also rigorous a sound mathematical definition of model risk using Banach spaces over weightedRiemannian manifolds, applicable to most modelling techniques using statistics as a starting point.Our proposed mathematical definition is to some extent comprehensive in two complementary ways. First, it iscapable of coping with relevant aspects of model risk management such as model usage, performance, mathematicalfoundations, model calibration or data. Second, it has the potential to asses many of the mathematical approaches cur-rently used in financial institutions: Credit risk, market risk, derivatives pricing and hedging, operational risk or XVA(valuation adjustments).It is worth noticing that the approaches in the literature, to our very best knowledge, are specific in these same twoways: They consider very particular mathematical techniques and are usually very focused on selected aspects of modelrisk management.There are many directions for further research, all of which we find to be both of theoretical and of practical interest.We shall finish by naming just a few of them:Banach spaces are very well known and have been deeply studied in the realms of for example functional analysis.On the other hand, weighted Riemannian manifolds are non–trivial extensions of Riemannian manifolds, one of thebuilding blocks of differential geometry. The study of Banach spaces over weighted Riemannian manifolds shall broadenour understanding of the properties of these spaces as well as their application to the quantification of model risk.Our framework can include data uncertainties by studying perturbations and metrics defined on the sample, whichare then transmitted to the weighted Riemannian manifold through the calibration process.The general methodology can be tailored and made more efficient for specific risks and methodologies. For example,one may interpret the local volatility model for derivatives pricing as an implicit definition of certain family of distri-butions, extending the Black–Scholes stochastic differential equation (which would be a means to define the lognormalfamily).Related to the previous paragraph, and despite the fact that there is literature on the topic, the calculation of theFisher–Rao metric itself deserves further numerical research in order to derive more efficient algorithms. An example can be a derivatives model used not only for pricing but also for hedging. Or equivalently by any possible transformation T : F → F . . Appendix In the Appendix we present the proof of Lemma 1 and Theorem 4.
Proof of Lemma 1.
To prove the equivalence we need to show that the function h defined in Lemma ?? preserves therequired properties of K under the continuous mapping exp p P Λ ρ . First we show that the composition that consists ofthree different mappings is well defined.As a first step, we define an n –dimensional cylinder C n := [0 , × S n − = { ( t, ν ) : t ∈ [0 , , ν ∈ S n − } ⊂ R n +1 where S n − := { ν ∈ R n : || ν || = ν + · · · + ν n = 1 } denotes the ( n − − dimensional unit sphere in R n . Thecylinder C n is a differentiable submanifold of R n +1 with boundaries ∂ C n := { (0 , ν ) : ν ∈ S n − } , ∂ C n := { (1 , ν ) : ν ∈ S n − } A Riemannian structure on C n is given by the restriction of the Euclidean metric in R n +1 to C n . Thus, C n is a compactRiemannian manifold. A canonical measure on C n is given by the product measure dt × dµ ( ν ) , where µ denotes thestandard surface measure on S n − .We define ρ ( v ) as the length d ( p , p ) ≥ of the geodesic γ connecting the point p with a boundary point p ∈ ∂U in the direction v ∈ S ( p , , where S ( p , denotes the ( n − − dimensional unit sphere in the tangent space T p M .Note that since U is the subset of the normal neighbourhood with respect to p , the exponential map is isometric.From now on we will assume that U is a compact star–shaped subset of a Riemannian manifold M and the distancefunction ρ is Lipschitz continuous on S ( p , ⊂ T p M . Lipschitz continuity of ρ ( v ) is equivalent to the assumption ofcontinuity and piecewise regularity of ∂U .Now we define an n − dimensional subset of C n by C nρ := { ( t, v ) : t ∈ [0 , ρ ( v )] , v ∈ S n − } ⊂ [0 , × S n − with boundary ∂ C nρ := { (0 , v ) : v ∈ S n − } , ∂ C nρ := { ( ρ ( v ) , v ) : v ∈ S n − } The new set C nρ is a compact subset of C n . In order to map C n onto C nρ we define the following coordinate transform: Λ ρ : C n → C nρ , ( t, v ) → (cid:16) ρ ( v ) t, v (cid:17) . Since the distance function v ρ ( v ) is strictly positive and Lipschitz continuous on S ( p , , so it is the inverse func-tion v ρ ( v ) . Therefore, the mapping Λ ρ defines a bi–Lipschitz mapping from C n onto C nρ . The Jacobian determinantof Λ ρ equals ρ almost everywhere on C n .Next we consider the polar transformation P defined by equation 5 which is well defined by continuity in T p M ,and maps C nρ onto U ⊂ T p M . Moreover, the transformation P defines a diffeomorphism from C nρ \{ ∂ C nρ , ∂ C nρ } onto the open set U\{ , ∂ U} . Combining P with the exponential map exp p , we have exp p P ( C nρ ) = U The composition exp p ◦ P defines a diffeomorphism from C nρ \{ ∂ C nρ , ∂ C nρ } onto U \{ p , ∂U } . Furthermore, theboundary ∂ C nρ is mapped onto { p } and the boundary ∂ C nρ onto the boundary ∂U . Then the points ( t, v ) ∈ C nρ induce geodesic polar coordinates on R n . When p ∈ ∂U , the boundary ∂ C nρ is mapped onto ∂U \{ p }
12e have introduced three mappings C n Λ ρ −−→ C nρ P −→ U exp p −−−→ U ⊂ M The composition exp p P Λ ρ is a continuous mapping from C n onto U . Moreover, exp p P Λ ρ maps the boundary ∂ C n of the cylinder C n onto the point p and the boundary ∂ C n onto ∂U .Now we prove that a consistent function satisfying properties ( H − ( H uniquely determines the weight functionsatisfying ( K − ( K : • It is straightforward to see, that properties ( K − ( K are satisfied by construction. The composition exp p P Λ ρ preserves connectedness and compactness and it a continuous mapping from C n onto M . Moreover, exp p P Λ ρ maps the left hand boundary ∂ C n onto the point p and the right hand boundary ∂ C n onto the boundary of U .Hence, the image f (exp p ( ρ ( v ) tv )) of a continuous function f on M is also continuous on the cylinder C n , andevery function g defined on C n satisfying consistency properties ( i ) − ( ii ) of Def.3 is the image of a continuousfunction on M under the pull–back operator Λ − ρ P − exp − p .The composition exp p P Λ ρ applied to a function h that is continuous on C n and satisfies the consistency condi-tions ( i ) − ( ii ) of Def.3 will give us a continuous function K on M that by construction is continuous along thegeodesics starting at p and ending at the points of the boundary. That means, property ( K is satisfied. The sameargument applies to any non–negative function h on C n . Thus, properties ( H − ( H ensures ( K − ( K under the composition exp p P Λ ρ ( v ) . • Further, it remains to prove that the weight function K is indeed a probability density on M with respect to themeasure dv ( p ) , i.e. to show that R M dζ = 1 . Z M dζ = Z M K ( p, t ) dv ( p ) = Z T p M K (exp p ( v ) , t ) η ( v ) dξ where dξ is the standard Lebesgue measure on the Euclidean space T p M and η p ( v ) = det (( d exp p ) v ) is theJacobian determinant of the exponential map. Note that η ( v ) represents the density function that is a positive andcontinuously differentiable function on U ⊂ T p M and the zeros of η lie at the boundary of M . Further, we have Z T p M K (exp p ( v ) , t ) η ( v ) dξ = Z S ( p , Z ρ ( v )0 t n − K (exp p ( tv ) , t ) η p ( tv ) dtdµ ( v ) where t n − is the Jacobian determinant of the polar coordinate transformation and dµ ( v ) is the standard Rieman-nian measure on the unit sphere S ( p , . The last step is the mapping from C ρ to C n : Z S ( p , Z ρ ( v )0 t n − K (exp p ( tv )) η p ( tv ) dtdµ ( v ) = Z S n − Z ρ ( v ) K (cid:16) exp p ( ρ ( v ) tv ) (cid:17)(cid:0) ρ ( v ) t (cid:1) n − η p ( ρ ( v ) tv ) dtdµ ( v ) where the Jacobian determinant is ρ ( v ) . Then using the expression for K we have that the expression above isequal to: Z S n − Z ρ ( v ) 1 η p ( ρ ( v ) tv ) t − n ρ ( v ) − n h tρ ( v ) , v !(cid:0) ρ ( v ) t (cid:1) n − η p ( ρ ( v ) tv ) dtdµ ( v )= Z S n − Z h tρ ( v ) , v ! dtdµ ( v ) = 1 (cid:3) Proof of Theorem 4.
Note that the composition exp p P Λ ρ induces a change of variables for integrable function f that yields to the following formula: Z M f ( p ) dζ = Z U f ( p ) dζ = Z U f (exp p ( v )) η p ( v ) dv = Z S ( p , Z ρ ( v )0 f (exp p ( tv )) t − n η p ( t, v ) dtdv = Z S n − Z f (exp p ( ρ ( ν ) ν )) 1 ρ ( ν ) η p ( tρ ( ν ) , ν ) dtdν η p is a well–defined, non–negative function with zeros at the cut locus of the point p . Be-sides, η p is continuous and differentiable function on M . The distance function ρ is a well defined, strictly pos-itive and Lipschitz continuous function on S ( p , , and thus is the inverse /ρ ( v ) . Therefore, the mapping Λ ρ de-fines a bi–Lipschitzian mapping from C n to C nρ . Moreover, the composition exp p P defines a diffeomorphism from C nρ { ∂ C nρ , ∂ C nρ } . The using the fact that the point set { p } and the boundary of U are subsets of tν − measure zero,we can conclude that the mapping exp p P Λ ρ is isomorphism. Then for any h defined on C n satisfying conditions ( i ) − ( ii ) of Def.3, the associated weight function K is well defined on U . The uniqueness of K follows after specifyinga function h that satisfies properties ( H − ( H . (cid:3) Note that when M is R n with the canonical metric, then η p ( p ) = 1 for all p ∈ R n . eferencesReferences [1] Amari, S-I, OE Barndorff-Nielsen, RE Kass, SL Lauritzen, and CR Rao (1987), “Differential geometry in statisticalinference.” Lecture Notes-Monograph Series, i–240.[2] Azzalini, Adelchi (1985), “A class of distributions which includes the normal ones.” Scandinavian Journal of Statis-tics, 171–178.[3] Boucher, Christophe M, Jón Daníelsson, Patrick S Kouontchou, and Bertrand B Maillet (2014), “Risk models-atrisk.” Journal of Banking & Finance, 44, 72–92.[4] Branger, Nicole and Christian Schlag (2004), “Model risk: A conceptual framework for risk measurement andhedging.” In EFMA 2004 Basel Meetings Paper.[5] Chavel, Isaac (2006), Riemannian geometry: a modern introduction, volume 98. Cambridge University Press.[6] Christodoulakis, George and Stephen Satchell (2008), "The validity of credit risk model validation methods." TheAnalytics of Risk Model Validation, 27–44.[7] Cont, Rama (2006), “Model uncertainty and its impact on the pricing of derivative instruments.” MathematicalFinance, 16, 519–547.[8] Federal Reserve (SR 11–7) (2011), “Supervisory guidance on model risk management.” Board of Governors of theFederal Reserve System, Office of the Comptroller of the Currency, SR Letter, 11–7.[9] Federer, Herbert (2014), Geometric measure theory. Springer.[10] Gibson, Rajna (2000), Model risk: concepts, calibration and pricing. Risk Books.[11] Hopf, H. and W. Rinow (1931), “Uber den begriff der vollstandigen differentialgeometrischen flachen.” Helv. 3,209–225.[12] Hull, John and Wulin Suo (2002), “A methodology for assessing model risk and its application to the impliedvolatility function model.” Journal of Financial and Quantitative Analysis, 37, 297–318.[13] Joshi, S, A Srivastava, and IH Jermyn (2007), “Riemannian analysis of probability density functions with applica-tions in vision.” IEEE.[14] Lang, Serge (2012), Fundamentals of differential geometry, volume 191. Springer Science & Business Media.[15] Morgan, Frank (2005), “Manifolds with density.” Notices of the AMS, 853–858.[16] Morini, Massimo (2011), Understanding and Managing Model Risk: A practical guide for quants, traders andvalidators. John Wiley & Sons.[17] Murray, Michael K and John W Rice (1993), Differential geometry and statistics, volume 48. CRC Press.[18] Rao, C Radhakrishna (1945), “Information and accuracy attainable in the estimation of statistical parameters.”Bull. Calcutta. Math. Soc., 37, 81–91.[19] Saltelli, A, S Tarantola, F Campolongo, and M Ratto (2013), “Sensitivity analysis in practice: A guide to assessingscientific models. 2004.” Chichester, England: John Wiley &&