Chance constrained sets approximation: A probabilistic scaling approach -- EXTENDED VERSION
Martina Mammarella, Victor Mirasierra, Matthias Lorenzen, Teodoro Alamo, Fabrizio Dabbene
CChance constrained sets approximation:A probabilistic scaling approach - EXTENDED VERSION
M. Mammarella a V. Mirasierra b M. Lorenzen c T. Alamo b F. Dabbene a a CNR-IEIIT; c/o Politecnico di Torino; C.so Duca degli Abruzzi 24, Torino; Italy. b Universidad de Sevilla, Escuela Superior de Ingenieros, Camino de los Descubrimientos s/n, Sevilla; Spain. c Systemwissenschaften, TTI GmbH, Nobelstr. 15, 70569 Stuttgart, Germany
Abstract
In this paper, a sample-based procedure for obtaining simple and computable approximations of chance-contrained sets isproposed. The procedure allows to control the complexity of the approximating set, by defining families of simple-approximatingsets of given complexity. A probabilistic scaling procedure then allows to rescale these sets to obtain the desired probabilisticguarantees. The proposed approach is shown to be applicable in several problem in systems and control, such as the design ofStochastic Model Predictive Control schemes or the solution of probabilistic set membership estimation problems.
In real-world applications, the complexity of the phe-nomena encountered and the random nature of datamakes dealing with uncertainty essential. In many cases,uncertainty arises in the modeling phase, in some othersit is intrinsic to both the system and the operative envi-ronment, as for instance wind speed and turbulence inaircraft or wind turbine control [1]. Hence, it is crucial toinclude underlying stochastic characteristic of the frame-work and eventually accept a violation of constraintswith a certain probability level, in order to improve thecoherence of the model and reality. Deriving results inthe presence of uncertainty is of major relevance in dif-ferent areas, including, but not limited to, optimization[2] and robustness analysis [3]. However, with respect torobust approaches, where the goal is to determine a fea-sible solution which is optimal in some sense for all pos-sible uncertainty instances , the goal in the stochasticframework is to find a solution that is feasible for almost all possible uncertainty realizations, [4,5]. In several ap-plications, including engineering and finance, where un-certainties in price, demand, supply, currency exchangerate, recycle and feed rate, and demographic conditionare common, it is acceptable, up to a certain safe level, to
Email addresses: [email protected] (M. Mammarella), [email protected] (V. Mirasierra), [email protected] (M. Lorenzen), [email protected] (T. Alamo), [email protected] (F. Dabbene). relax the inherent conservativeness of robust constraintsenforcing probabilistic constraints. More recently, themethod has been used also in unmanned autonomous ve-hicle navigation [6,7] as well as optimal power flow [8,9].In the optimization framework, constraints involvingstochastic parameters that are required to be satisfiedwith a pre-specified probability threshold are called chance constraints (CC). In general, dealing with CCimplies facing two serious challenges, that of stochastic-ity and of nonconvexity [10]. Consequently, while beingattractive from a modeling viewpoint, problems involv-ing CC are often computationally intractable, generallyshown to be NP-hard, which seriously limits their ap-plicability. However, being able to efficiently solve CCproblems remains an important challenge, especiallyin systems and control, where CC often arise, as e.g.in stochastic model predictive control (SMPC) [11,12].The scientific community has devoted large research indevising computationally efficient approaches to dealwith chance-constraints. We review such techniques inSection 3, where we highlight three mainstream ap-proaches: i) exact techniques ; ii) robust approximations and iii) sample-based approximations . In this paper, wepresent what we consider an important step forward inthe sample-based approach. We propose a simple andefficient strategy to obtain a probabilistically guaran-teed inner approximation of a chance constrained set,with given confidence.In particular, we describe a two step procedure the in-
Preprint submitted to Automatica 19 January 2021 a r X i v : . [ ee ss . S Y ] J a n olves: i) the preliminary approximation of the chanceconstraint set by means of a so-called Simple Approxi-mating Set (SAS), ii) a sample-used scaling procedurethat allows to properly scale the SAS so to guaranteethe desired probabilistic properties. The proper selec-tion of a low-complexity SAS allows the designer to eas-ily tune the complexity of the approximating set, sig-nificantly reducing the sample complexity. We proposeseveral candidate SAS shapes, grouped in two classes: i)sampled-polytopes; and ii) norm-based SAS.The probabilistic scaling approach was presented in theconference papers [13,14]. The present work extendsthese in several directions: first, we performe here athorough mathematical analysis the results, providingof all results. Second, the use of norm-based SAS is ex-tended to comprise more general sets (as e.g. , and Moreimportantly, we consider here joint chance constraints .This choice is motivated by the fact that enforcing jointchance constraints, which have to be satisfied simulta-neously, adheres better to some applications, despitethe inherent complexity. Finally, we present here a sec-ond application, besides SMPC, related to probabilisticset-membership identification.The paper is structured as follows. Section 2 providesa general preamble of the problem formulation and ofchance constrained optimization, including two moti-vating examples. An extensive overview on methods forapproximating chance constrained sets is reported inSection 3 whereas the probabilistic scaling approach hasbeen detailed in Section 4. Section 5 and Section 6 arededicated to the definition of selected candidate SAS, i.e.sampled-polytope and norm-based SAS, respectively.Last, in Section 7, we validate the proposed approachwith a numerical example applying our method to aprobabilistic set membership estimation problem. Mainconclusions and future research directions are addressedin Section 8. Given an integer N , [ N ] denotes the integers from 1 to N .Given z ∈ R s and p ∈ [1 , ∞ ), we denote by (cid:107) z (cid:107) p the (cid:96) p -normof z , and by B sp . = { z ∈ R s : (cid:107) z (cid:107) p ≤ } (cid:96) p -norm ball ofradius one. Given integers k, N , and parameter p ∈ (0 , B ( k ; N, p ) . = k (cid:88) i =0 Ni p i (1 − p ) N − i . (1)The following notation is borrowed from the field of orderstatistics [15]. Given a set of N scalars γ i ∈ R N , i ∈ [ N ],we denote γ N the smallest one, γ N the second smallestone, and so on and so forth until γ N : N , which is equal to thelargest one. In this way, given r ≥ γ r +1: N satisfies that no more than r elements of { γ , γ , . . . , γ N } arestrictly smaller than γ r +1: N . The Chebyshev center of a given set X , denoted as Cheb ( X ),is defined as the center of the largest ball inscribed in X , i.e. Cheb ( X ) . = arg min θ c max θ ∈ X (cid:8) (cid:107) θ − θ c (cid:107) (cid:9) . Given an (cid:96) p -norm (cid:107) · (cid:107) p , its dual norm (cid:107) · (cid:107) p ∗ is defined as (cid:107) c (cid:107) p ∗ . = sup z ∈ B sp c (cid:62) z, ∀ c ∈ R s . In particular, the couples ( p, p ∗ ): (2 , , ∞ ), ( ∞ ,
1) giveraise to dual norms.
Consider a robustness problem, in which the controllerparameters and auxiliary variables are parametrized bymeans of a decision variable vector θ , which is usuallyreferred to as design parameter and is restricted to a setΘ ⊆ R n θ . Furthermore, the uncertainty vector w ∈ R n w represents one of the admissible uncertainty realizationsof a random vector with given probability distribution Pr W and (possibly unbounded) support W .This paper deals with the special case where the designspecifications can be decoded as a set of n (cid:96) uncertainlinear inequalities F ( w ) θ ≤ g ( w ) , (2)where F ( w ) = f (cid:62) ( w )... f (cid:62) n (cid:96) ( w ) ∈ R n (cid:96) × n θ , g ( w ) = g ( w )... g n (cid:96) ( w ) ∈ R n (cid:96) , are measurable functions of the uncertainty vector w ∈ R n w . The inequality in (2) is to be interpretedcomponent-wise, i.e. f (cid:96) ( w ) θ ≤ g (cid:96) ( w ) , ∀ (cid:96) ∈ [ n (cid:96) ] . Furthermore, we notice that each value of w gives raiseto a corresponding set X ( w ) = { θ ∈ Θ : F ( w ) θ ≤ g ( w ) } . (3)Due to the random nature of the uncertainty vector w ,each realization of w corresponds to a different set oflinear inequalities. Consequently, each value of w givesraise to a corresponding set X ( w ) = { θ ∈ Θ : F ( w ) θ ≤ g ( w ) } . (4)In every application, one usually accepts a risk of violat-ing the constraints. While this is often done by choosing2he set W appropriately, we can find a less conservativesolution by choosing the set W to encompass all possiblevalues and characterizing the region of the design spaceΘ in which the fraction of elements of W , that violatethe constraints, is below a specified level. This conceptis rigorously formalized by means of the notion of prob-ability of violation . Definition 1 (Probability of violation)
Consider aprobability measure Pr W over W and let θ ∈ Θ be given.The probability of violation of θ relative to inequality (2) is defined as Viol ( θ ) . = Pr W { F ( w ) θ (cid:54)≤ g ( w ) } . Given a constraint on the probability of violation, i.e.
Viol ( θ ) ≤ ε , we denote as (joint) chance constrained set of probability ε (shortly, ε -CCS) the region of the designspace for which this probabilistic constraint is satisfied.This is formally stated in the next definition. Definition 2 ( ε -CCS) Given ε ∈ (0 , , we define thechance constrained set of probability ε as follows X ε = { θ ∈ Θ :
Viol ( θ ) ≤ ε } . (5)Note that the ε -CCS represents the region of the designspace Θ for which this probabilistic constraint is satisfiedand it is equivalently defined as X ε . = (cid:110) θ ∈ Θ : Pr W { F ( w ) θ ≤ g ( w ) } ≥ − ε (cid:111) . (6) Remark 1 (Joint vs. individual CCs)
The con-straint θ ∈ X ε , with X ε defined in (6) , describes a jointchance constraint. That is, it requires that the jointprobability of satisfying the inequality constraint F ( w ) θ ≤ g ( w ) is guaranteed to be greater than the probabilistic level − ε . We remark that this constraint is notably harder toimpose than individual CCs, i.e. constraints of the form θ ∈ X (cid:96)ε (cid:96) . = (cid:110) θ ∈ Θ : Pr W (cid:8) f (cid:96) ( w ) (cid:62) θ ≤ g (cid:96) ( w ) (cid:9) ≥ − ε (cid:96) (cid:111) ,(cid:96) ∈ [ n (cid:96) ] , with ε (cid:96) ∈ (0 , . A discussion on the differences and im-plications of joint and individual chance constraints maybe found in several papers, see for instance [10,16] andreferences therein. Example 1
A simple illustrating example of the set ε -CCS is shown in Figure 1. The dotted circle is the regionof the design space that satisfies all the constraints (theso called robust region), which are tangent to the dottedcircle at points uniformly generated. The outer red circle Fig. 1. Red circle = X ε , dotted circle = unit circle, blue lines= constraint samples. represents the chance constrained set X ε for the specificvalue ε = 0 . . That is, the red circle is obtained in sucha way that every point in it has a probability of violating arandom constraint no larger than . . Note that in thisvery simple case, the set X ε can be computed analytically,and turns out to be a scaled version of the robust set. Weobserve that the ε -CCS is significantly larger than therobust set. Hence, while there exist simple examples for which aclosed-form computation of X ε is possible, as the onere-proposed here and first used in [13], we remark thatthis is not the case in general. Indeed, as pointed out in[10], typically the computation of the ε -CCS is extremelydifficult, since the evaluation of the probability Viol ( θ )amounts to the computation of a multivariate integral,which is NP-Hard [17].Moreover, the set ε -CCS is often nonconvex, except forvery special cases. For example, [1,18] show that the so-lution set of separable chance constraints can be writtenas the union of cones, which is nonconvex in general. Example 2 (Example of nonconvex ε -CCS) To il-lustrate these inherent difficulties, we consider the fol-lowing three-dimensional example ( n θ = 3 ) with w = { w , w } , where the first uncertainty w ∈ R is a three-dimensional normal-distributed random vector with zeromean and covariance matrix Σ = . .
26 1 . .
26 3 .
58 1 . . .
94 2 . , and the second uncertainty w ∈ R is a three-dimensional random vector whose elements are uni-formly distributed in the interval [0 , . The set of viabledesign parameters is given by n (cid:96) = 4 uncertain linear nequalities of the form F ( w ) θ ≤ , F ( w ) = (cid:104) w w (2 w − w ) w (cid:105) (cid:62) . (7) The square power w is to be interpreted element-wise.In this case, to obtain a graphical representation of theset X ε , we resorted to gridding the set Θ and, for eachpoint θ in the grid, to approximate the probability througha Monte Carlo computation. This procedure is clearly un-affordable for higher dimensions frameworks. In Figure 2we report the plot of the computed ε -CCS set for differentvalues of ε . We observe that the set is indeed nonconvex. Fig. 2. The ε -CCS set for ε = 0 .
15 (smaller set), ε = 0 . ε = 0 .
45 (larger set). We observethat all sets are nonconvex, but the nonconvexity is moreevident for larger values of ε , corresponding to larger levels ofaccepted violation, while the set X ε appears “almost convex”for small values of ε . This kind of behaviour is in accordancewith a recent result that prove convexity of the ε -CCS forvalues of ε going to zero, and it is usually referred to as eventual convexity [19]. Finding an optimal θ ∈ X ε for a given cost function J : R n θ → R , leads to the chance constrained optimiza-tion (CCO) problem min θ ∈ X ε J ( θ ) , (8)where the cost-function J ( θ ) is usually assumed to be aconvex, often even a quadratic or linear function.We remark that the solution of the CCO problem (8) isin general NP-hard, for the same reasons reported be-fore. We also note that several stochastic optimizationproblems arising in different application contexts can beformulated as a CCO. Typical examples are for instancethe reservoir system design problem proposed in [20], where the problem is to minimize the total building andpenalty costs while satisfying demands for all sites andall periods with a given probability, or the cash matchingproblem [21], where one aims at maximizing the portfo-lio value at the end of the planning horizon while cover-ing all scheduled payments with a prescribed probability.CCO problems also frequently arise in short-term plan-ning problems in power systems. These optimal powerflow (OPF) problems are routinely solved as part of thereal-time operation of the power grid. The aim is deter-mining minimum-cost production levels of controllablegenerators subject to reliably delivering electricity tocustomers across a large geographical area, see e.g. [8]and references therein.In the next subsections, we report two control-relatedproblems which served as motivation of our study. To motivate the proposed approach, we consider theStochastic MPC framework proposed in [12,11]. We aregiven a discrete-time system x k +1 = A ( σ k ) x k + B ( σ k ) u k + a σ ( σ k ) , (9)subject to generic uncertainty σ k ∈ R n σ , with state x k ∈ R n x , control input u k ∈ R n u , and the vector val-ued function a σ ( σ k ) representing additive disturbanceaffecting the system state. The system matrices A ( σ k )and B ( σ k ), of appropriate dimensions, are (possibly non-linear) functions of the uncertainty σ k at step k . For k = 1 , , . . . , the disturbances σ k are modeled as re-alizations of a stochastic process. In particular, σ k areassumed to be independent and identically distributed(iid) realizations of zero-mean random variables withsupport S ⊆ R n σ . Note that the presence of both addi-tive and multiplicative uncertainty, combined with thenonlinear dependence on the uncertainty, renders theproblem particularly arduous. Furthermore, we remarkthat the system representation in (9) is very general, andencompasses, among others, those in [11,12,22].Given the model (9) and a realization of the state x k attime k , state predictions t steps ahead are random vari-ables as well and are denoted x t | k , to differentiate it fromthe realization x t + k . Similarly u t | k denotes predicted in-puts that are computed based on the realization of thestate x k .Contrary to [11,12,22], where the system dynamics weresubject to individual state and input chance constraints,here we take a more challenging route, and we consider4 oint state and input chance constraints of the form Pr σσσ (cid:8) H x x t | k + H u u t | k ≤ n t | x k (cid:9) ≥ − ε, (10)with t ∈ { , . . . , T − } , ε ∈ (0 , H x ∈ R n (cid:96) × n x , H u ∈ R n (cid:96) × n u .The probability Pr σσσ is measured with respect to the se-quence σσσ = { σ t } t>k . Hence, equation (10) states thatthe probability of violating the linear constraint H x x + H u u ≤ ε .The objective is to derive an asymptotically stabiliz-ing control law for the system (9) such that, in closedloop, the constraint (10) is satisfied. Following the ap-proach in [12], a stochastic MPC algorithm is consid-ered to solve the constrained control problem. The ap-proach is based on repeatedly solving a stochastic opti-mal control problem over a finite, moving horizon, butimplementing only the first control action. The designparameter θ is then given by the control sequence u k =( u | k , u | k , ..., u T − | k ) and the prototype optimal controlproblem to be solved at each sampling time k is definedby the cost function J T ( x k , u k ) = (11) E (cid:40) T − (cid:88) t =0 (cid:16) x (cid:62) t | k Qx t | k + u (cid:62) t | k Ru t | k (cid:17) + x (cid:62) T | k P x T | k | x k (cid:41) , with Q ∈ R n x × n x , Q (cid:23) R ∈ R n u × n u , R (cid:31)
0, andappropriately chosen P (cid:31)
0, subject to the system dy-namics (9) and constraints (10).The online solution of the stochastic MPC problem re-mains a challenging task but several special cases, whichcan be evaluated exactly, as well as methods to approx-imate the general solution have been proposed in theliterature. The approach followed in this work was firstproposed in [11,12], where an offline sampling schemewas introduced. Therein, with a prestabilizing input pa-rameterization u t | k = Kx t | k + v t | k , (12)with suitably chosen control gain K ∈ R n u × n x and newdesign parameters v t | k ∈ R n u , equation (9) is solved ex-plicitly for the predicted states x | k , . . . , x T | k and pre-dicted inputs u | k , . . . , u T − | k . In this case, the expectedvalue of the finite-horizon cost (11) can be evaluated of- The case where one wants to impose hard input constraintscan be also be formulated in a similar framework, see e.g. [11]. fline , leading to a quadratic cost function of the form J T ( x k , v k ) = (cid:104) x (cid:62) k v (cid:62) k (cid:62) n x (cid:105) ˜ S x k v k n x (13)in the deterministic variables v k = ( v | k , v | k , ..., v T − | k )and x k .Focusing now on the constraint definition, we no-tice that by introducing the uncertainty sequence σσσ k = { σ t } t = k,...,k + T − , we can rewrite the joint chanceconstraint defined by equation (10) as X smpc ε = (cid:40) (cid:34) x k v k (cid:35) ∈ R n x + n u T : Pr σσσ k (cid:34) f x(cid:96) ( σσσ k ) f v(cid:96) ( σσσ k ) (cid:35) (cid:62) (cid:34) x k v k (cid:35) ≤ , (cid:96) ∈ [ n (cid:96) ] ≥ − ε (cid:111) , (14)with f x(cid:96) : R n σ → R n x , f v(cid:96) : R n σ → R n u T being knownfunctions of the sequence of random variables σσσ k . Weremark that, in the context of this paper, neither thedetailed derivation of the cost matrix ˜ S in (13) nor thatof f v(cid:96) , f x(cid:96) are relevant for the reader, who can refer to[12, Appendix A] for details. Note that, by defining θ =[ x (cid:62) k , v (cid:62) k ] (cid:62) , (14) is given in the form of (5) .As discussed in [11], obtaining a good and simple enoughapproximation of the set X smpc ε is extremely importantfor online implementation of SMPC schemes. In partic-ular, if we are able to replace the set X smpc ε by a suit-able inner approximation, we would be able to guaran-tee probabilistic constraint satisfaction of the ensuingSMPC scheme. On the other hand, we would like thisinner approximation to be simple enough, so to renderthe online computations fast enough. Suppose that there exists ¯ θ ∈ Θ such that | y − ¯ θ T ϕ ( x ) | ≤ ρ, ∀ ( x, y ) ∈ W ⊆ R n x × R , where ϕ : R n x → R n θ is a (possibly non-linear) regressorfunction, and ρ > θ that satisfy the constraint | y − θ T ϕ ( x ) | ≤ ρ for all possible values of ( x, y ) ∈ W . In the literature,this set is usually referred to as the feasible parameter et , that is FPS . = { θ ∈ Θ : | y − θ T ϕ ( x ) | ≤ ρ, ∀ ( x, y ) ∈ W } . (15)If, for given w = ( x, y ), we define the set X ( w ) = { θ ∈ Θ : | y − θ T ϕ ( x ) | ≤ ρ } , then the feasible parameter set FPS can be rewritten as
FPS = { θ ∈ Θ : θ ∈ X ( w ) , ∀ w ∈ W } . The deterministic set membership problem suffers fromthe following limitations in real applications: i) due tothe possible non-linearity of ϕ ( · ), checking if a given θ ∈ Θ satisfies the constraint θ ∈ X ( w ), for every w ∈ W ,is often a difficult problem; ii) in many situations, onlysamples of W are available: thus, the robust constraintcannot be checked and only outer bounds of FPS can becomputed; and iii) because of outliers and possible nonfinite support of W , set FPS is often empty (especiallyfor small values of ρ ).If a probability distribution is defined on W , the prob-abilistic set membership estimation problem is that ofcharacterizing the set of parameters θ that satisfy Pr W {| y − θ T ϕ ( x ) | ≤ ρ } ≥ − (cid:15), for a given probability parameter (cid:15) ∈ (0 , FPS (cid:15) the set of parameters that satisfy theprevious probabilistic constraint, that is,
FPS (cid:15) = { θ ∈ Θ : Pr W { θ ∈ X ( w ) } ≥ − (cid:15) } . It is immediate to notice that this problem fits in theformulation proposed in this section: It suffices to define F ( w ) = (cid:34) ϕ T ( x ) − ϕ T ( x ) (cid:35) , g ( w ) = (cid:34) ρ + yρ − y (cid:35) . Motivated by the discussion above, we are ready to for-mulate the main problem studied in this paper.
Problem 1 ( ε -CCS approximation) Given the set oflinear inequalities (2) , and a violation parameter ε , findan inner approximation of the set X ε . The approximationshould be: i) simple enough, ii) easily computable. A solution to this problem is provided in the paper. Inparticular, regarding i), we present a solution in whichthe approximating set is represented by few linear in-equalities . Regarding ii), we propose a computationallyefficient procedure for its construction (see Algorithm 1). Before presenting our approach, in the next section weprovide a brief literature overview of different methodspresented in the literature to construct approximationsof the ε -CCS set. ε -CCSapproximations The construction of computational efficient approxima-tions to ε -CCS is a long-standing problem. In particu-lar, the reader is referred to the recent work [10], whichprovides a rather complete discussion on the topic, andcovers the most recent results. The authors distinguishthree different approaches, which we very briefly revisithere. In some very special cases, the ε -CCS is convex andhence the CCO problem admits a unique solution. Thisis the case, for instance, of individual chance constraintswith w being Gaussian [25]. Other important examplesof convexity of the set X ε involve log-concave distribu-tion [1,26]. General sufficient conditions on the convex-ity of chance constraints may be found in [27,28,29,19].However, all these cases are very specific and hardly ex-tend to joint chance constraints considered on this work. A second class of approaches consist in finding determin-istic conditions that allow to construct a set X , which isa guaranteed inner convex approximation of the proba-bilistic set X ε . The classical solution consists in the ap-plications of Chebyshev-like inequalities, see e.g. [30,31].More recent techniques, which are proved particularlypromising, involve robust optimization [3], as the convexapproximations introduced in [32]. A particular interest-ing convex relaxation involves the so-called ConditionalValue at Risk (CVaR), see [33] and references therein.Finally, we point out some recent techniques based onpolynomial moments relaxations [34,35]. Nonetheless, itshould be remarked that these techniques usually sufferfrom conservatism and computational complexity issues,especially in the case of joint chance constraints. In recent years, a novel approach to approximate chanceconstraints, based on random sampling of the uncertainparameters, has gained popularity, see e.g. [4,5] and ref-erences therein. Sampling-based techniques are charac-terized by the use of a finite number N of iid samplesof the uncertainty (cid:8) w (1) , w (2) , . . . , w ( N ) (cid:9) drawn accord-ing to a probability distribution Pr W . To each sample6 ( i ) , i ∈ [ N ], we can associate the following sampled set X ( w ( i ) ) = { θ ∈ Θ : F ( w ( i ) ) θ ≤ g ( w ( i ) ) } , (16)sometimes referred to as scenario , since it represents anobserved instance of our probabilistic constraint.Then, the scenario approach considers the CCO prob-lem (8) and approximates its solution through the fol-lowing scenario problem θ ∗ sc = arg min J ( θ ) (17)subject to θ ∈ X ( w ( i ) ) , i ∈ [ N ] . We note that, if the function J ( θ ) is convex, problem (17)becomes a linearly constrained convex program, forwhich very efficient solution approaches exist. A fun-damental result [36,37,38,39] provides a probabilisticcertification of the constraint satisfaction for the solu-tion to the scenario problem. In particular, it is shownthat, under some mild assumptions (non-degenerateproblem), we have Pr W N { Viol ( θ ∗ sc ) > ε } ≤ B ( n θ − N, ε ) , (18)where the probability in (18) is measured with respect tothe samples { w (1) , w (2) , . . . , w ( N ) } . Moreover, the boundin (18) is shown to be tight. Indeed, for the class of so-called fully-supported problems, the bound holds withequality, i.e. the Binomial distribution B ( n θ − N, ε )represents the exact probability distribution of the vio-lation probability [37].A few observations are at hand regarding the scenarioapproach and its relationship with Problem 1. First, ifwe define the sampled constraints set as X N . = N (cid:92) i =1 X ( w ( i ) ) , (19)we see that the scenario approach consists in approxi-mating the constraint θ ∈ X ε in (8) with its sampled ver-sion θ ∈ X N . On the other hand, it should be remarkedthat the scenario approach cannot be used to derive anyguarantee on the relationship existing between X N and X ε . Indeed, the nice probabilistic property in (18) holds only for the optimum of the scenario program θ ∗ sc . Thisis a fundamental point, since the scenario results buildon the so-called support constraints, which are definedfor the optimum point θ ∗ sc only.On the contrary, in our case we are interested in estab-lishing a direct relation (in probabilistic terms) betweenthe set X N and the ε -CCS X ε . This is indeed possible,but needs to resort to results based on Statistical Learn-ing Theory [40], summarized in the following lemma. Lemma 1 (Learning Theory bound)
Given proba-bilistic levels δ ∈ (0 , and ε ∈ (0 , . , if the numberof samples N is chosen so that N ≥ N LT , with N LT . = 4 . ε (cid:16) ln 21 . δ + 4 . n θ log (cid:16) en (cid:96) ε (cid:17)(cid:17) , (20) then Pr W N { X N ⊆ X ε } ≥ − δ . The lemma, whose proof is reported in Appendix A.1, isa direct consequence of the results on VC-dimension ofthe so-called ( α, k )-Boolean Function, given in [41].
Remark 2 (Sample-based SMPC)
The learningtheory-based approach discussed in this section has beenapplied in [11] to derive an offline probabilistic inner ap-proximation of the chance constrained set X smpc ε definedin (14) , considering individual chance constraints. Inparticular, the bound (2) is a direct extension to the caseof joint chance constraints of the result proved in [11].Note that since we are considering multiple constraintsat the same time (like in (2) ), the number of constraints n (cid:96) enters into the sample size bound. To explain howthe SMPC design in [11] extends to the joint chanceconstraints framework, we briefly recall it.First, we extract offline (i.e. when designing the SMPCcontrol) N iid samples of the uncertainty, σσσ ( i ) k of σσσ k , andwe consider the sampled set X smpc ( σσσ ( i ) k ) = (cid:40) (cid:34) x k v k (cid:35) : (cid:34) f x(cid:96) ( σσσ ( i ) k ) f v(cid:96) ( σσσ ( i ) k ) (cid:35) (cid:62) (cid:34) x k v k (cid:35) ≤ , (cid:96) ∈ [ n (cid:96) ] (cid:41) , and X smpc N . = (cid:84) Ni =1 X smpc ( σσσ ( i ) k ) . Then, applying Lemma 1with n θ = n x + n u T , we conclude that if we extract N ≥ N smpc LT samples, it is guaranteed that, with probabil-ity at least − δ , the sample approximation X smpc N is asubset of the original chance constraint X smpc ε . Exploit-ing these results, the SMPC problem can be approximatedconservatively by the linearly constrained quadratic pro-gram min v k J T ( x k , v k ) subject to ( x k , v k ) ∈ X smpc N . (21) Hence the result reduces the original stochastic optimiza-tion program to an efficiently solvable quadratic pro-gram. This represents an undiscussed advantage, whichhas been demonstrated for instance in [12]. On the otherhand, it turns out that the ensuing number of linear con-straints, equal to n (cid:96) · N smpc LT may still be too large. Forinstance, even for a moderately sized MPC problem with n x = 5 states, n u = 2 inputs, prediction horizon of T = 10 , simple interval constraints on states and inputs(i.e. n (cid:96) = 2 n x + 2 n u = 14 ), and for a reasonable choiceof probabilistic parameters, i.e. ε = 0 . and δ = 10 − ,we get N smpc LT = 114 , , which in turn corresponds to ore than . million linear inequalities. For this reason,in [11] a post-processing step was proposed to remove re-dundant constraints. While it is indeed true that all thecumbersome computations may be performed offline, it isstill the case that, in applications with stringent require-ments on the solution time, the final number of inequal-ities may easily become unbearable. Remark 2 motivates the approach presented in the nextsection, which builds upon the results presented in [13].We show how the probabilistic scaling approach directlyleads to approximations of user-chosen complexity,which can be directly used in applications instead ofcreating the need for a post-processing step to reducethe complexity of the sampled set.
We propose a novel sample-based approach, alternativeto the randomized procedures proposed so far, which al-lows to maintain the nice probabilistic features of thesetechniques, while at the same time providing the de-signer with a way of tuning the complexity of the ap-proximation.The main idea behind this approach consists of first ob-taining a simple initial approximation of the shape ofthe probabilistic set X ε by exploiting scalable simple ap-proximating sets (Scalable SAS) of the form S ( γ ) = θ c ⊕ γ S . (22)These sets are described by a center point θ c and a low-complexity shape set S . The center θ c and the shape S constitute the design parameters of the proposed ap-proach. By appropriately selecting the shape S , the de-signer can control the complexity of the approximatingset.Note that we do not ask this initial set to have any guar-antee of probabilistic nature. What we ask is that thisset is being able to “capture” somehow the shape of theset X ε . Recipes on a possible procedure for constructingthis initial set are provided in section 5. The set S con-stitutes the starting point of a scaling procedure, whichallows to derive a probabilistic guaranteed approxima-tion of the ε -CCS, as detailed in the next section. In par-ticular, we show how an optimal scaling factor γ can bederived so that the set (22) is guaranteed to be an innerapproximation of X ε with the desired confidence level δ .We refer to the set S ( γ ) as Scalable SAS . In this section, we address the problem of how to scalethe set S ( γ ) around its center θ c to guarantee, with con-fidence level δ ∈ (0 , X ε . Within this sample-based procedure we assumethat N γ iid samples { w (1) , . . . , w ( N γ ) } are obtained from Pr W and based on these, we show how to obtain a scalar¯ γ > Pr W Nγ { S (¯ γ ) ⊆ X ε } ≥ − δ. To this end, we first define the scaling factor associatedto a given realisation of the uncertainty.
Definition 3 (Scaling factor)
Given a Scalable SAS S ( γ ) , with given center θ c and shape S ⊂ Θ , and a realiza-tion w ∈ W , we define the scaling factor of S ( γ ) relativeto w as γ ( w ) . = if θ c (cid:54)∈ X ( w )max S ( γ ) ⊆ X ( w ) γ otherwise . with X ( w ) defined as in (16) . That is γ ( w ) represents the maximal scaling that canbe applied to S ( γ ) = θ c ⊕ γ S around the center θ c sothat S ( γ ) ⊆ X ( w ). The following theorem states how toobtain, by means of sampling, a scaling factor ¯ γ thatguarantees, with high probability, that S (¯ γ ) ⊆ X ε . Theorem 1 (Probabilistic scaling)
Given a candi-date Scalable SAS S ( γ ) , with θ c ∈ X ε , accuracy parame-ter ε ∈ (0 , , confidence level δ ∈ (0 , , and a discardinginteger parameter r ≥ , let N γ be chosen such that B ( r ; N γ , ε ) ≤ δ. (23) Draw N γ iid samples { w (1) , w (2) , . . . , w ( N γ ) } from distri-bution Pr W , compute the corresponding scaling factor γ i . = γ ( w ( i ) ) , (24) for i ∈ [ N γ ] according to Definition 3, and let ¯ γ = γ r : N γ . Then, with probability no smaller than − δ , S (¯ γ ) = θ c ⊕ ¯ γ S ⊆ X ε . Proof:
If ¯ γ = 0, then we have S (¯ γ ) ≡ θ c ∈ X ε . Hence,consider ¯ γ >
0. From Property 1 in Appendix A.2,we have that ¯ γ − δ , that Pr W { S ( γ ) (cid:54)⊆ X ( w ) } ≤ ε . Equivalently, Pr W { S ( γ ) ⊆ X ( w ) } > − ε. This can be rewritten as Pr W { F ( w ) (cid:62) θ ≤ g ( w ) , ∀ θ ∈ S ( γ ) } > − ε, and it im-plies that the probability of violation in θ c ⊕ ¯ γ S is nolarger than ε , with probability no smaller than 1 − δ .In the light of the theorem above, from now on we willassume that the Scalable SAS is such that θ c ∈ X ε . Theabove result leads to the following simple algorithm, in8hich we summarise the main steps for constructing thescaled set, and we provide an explicit way of determiningthe discarding parameter r . Algorithm 1
Probabilistic SAS Scaling Given a candidate Scalable SAS S ( γ ), and probabil-ity levels ε and δ , choose N γ ≥ . ε ln 1 δ and r = (cid:22) εN γ (cid:23) . (25) Draw N γ samples of the uncertainty w (1) , . . . , w ( N γ ) for i = 1 to N γ do Solve the optimization problem γ i . = max S ( γ ) ⊆ X ( w ( i ) ) γ (26) end for Return ¯ γ = γ r : N γ , the (1 + r )-th smallest valueof γ i .A few comments are in order regarding the algorithmabove. In step 4, for each uncertainty sample w ( i ) onehas to solve an optimization problem, which amountsto finding the largest value of γ such that S ( γ ) is con-tained in the set X ( w ( i ) ) defined in (16). If the SAS ischosen accurately, we can show that this problem is con-vex and computationally very efficient: this is discussedin Section 5. Then, in step 6, one has to re-order the set { γ , γ , . . . , γ N γ } so that the first element is the smallestone, the second element is the second smallest one, andso on and so fort, and then return the r + 1-th element ofthe reordered sequence. The following Corollary appliesto Algorithm 1. Corollary 1
Given a candidate SAS set in the form S ( γ ) = θ c ⊕ γ S , assume that θ c ∈ X ε . Then, Algorithm 1guarantees that S (¯ γ ) ⊆ X ε with probability at least − δ .Proof: The result is a direct consequence of Theorem 1,which guarantees that, for given r ≥ Pr { S ( γ ) ⊆ X ε } is guaranteed if the scaling is performed on a number ofsamples satisfying (23). From [42, Corollary 1]) it followsthat, in order to satisfy (23) it suffices to take N γ suchthat N γ ≥ ε (cid:32) r + ln 1 δ + (cid:114) r ln 1 δ (cid:33) . (27)Since r = (cid:98) εN (cid:99) , we have that r ≤ εN . Thus, inequality(27) is satisfied if N γ ≥ ε (cid:32) εN γ δ + (cid:114) εN γ ln 1 δ (cid:33) = N γ ε ln 1 δ + (cid:114) N γ ε ln 1 δ . Letting ∇ . = (cid:112) N γ and α . = (cid:113) ε ln δ , the above in-equality rewrites ∇ − α ∇ − α ≥ , which has uniquepositive solution ∇ ≥ (1+ √ α . In turn, this rewrites as N γ ≥ (1 + √ ε ln 1 δ . The formula (25) follows by observing that (1 + √ < . ε -CCS. Moreover, inthe light of the discussion after Algorithm 1, we also askthese sets to be convex. First, we note that the most straightforward way to de-sign a candidate SAS is again to recur to a sample-basedprocedure: we draw a fixed number N S of “design” un-certainty samples { ˜ w (1) , . . . , ˜ w ( N S ) } , and construct aninitial sampled approximation by introducing the follow-ing sampled-polytope SAS S N S = N S (cid:92) j =1 X ( ˜ w ( j ) ) . (28)Note that the sampled polytope S N S , by construction, isgiven by the intersection of n (cid:96) N S half-spaces. Hence, weobserve that this approach provides very precise controlon the final complexity of the approximation, throughthe choice of the number of samples N S . However, itis also clear that a choice for which N S << N LT im-plies that the probabilistic properties of S N S before scal-ing will be very bad. However, we emphasize again thatthis initial geometry doesn’t have nor require any prob-abilistic guarantees, which are instead provided by theprobabilistic scaling discussed in Section 4.1. It shouldbe also remarked that this is only one possible heuris-tic. For instance, along this line one could as well drawmany samples and then apply a clustering algorithm toboil it down to a desired number of samples.We remark that, in order to apply the scaling procedure,we need to define a center around which to apply thescaling procedure. To this end, we could compute the so-called Chebyshev center, defined as the center of largest Note that both quantities under square root are positive. These samples are denoted with a tilde to distinguish themfrom the samples used in the probabilistic scaling procedure. S N S , i.e. θ c = Cheb ( S N S ). We note thatcomputing the Chebyshev center of a given polytopeis an easy convex optimization problem, for which effi-cient algorithms exist, see e.g. [43]. A possible alternativewould be the analytic center of S N S , whose computationis even easier (see [43] for further details). Once the cen-ter θ c has been determined, the scaling procedure canbe applied to the set S N S ( γ ) . = θ c ⊕ γ { S N S (cid:9) θ c } . Notethat the center needs to be inside X ε . Aside for that,the choice of θ c only affects the goodness of the shape,but we can never know a priori if the analytic center isa better choice than any random center in X ε . Example 3 (Sample-based approximations)
Toillustrate how the proposed scaling procedure works inpractice in the case of sampled-polytope SAS, we re-visit Example 2. To this end, a pre-fixed number N S ofuncertainty samples were drawn, and the set inequalities F ( ˜ w ( j ) ) θ ≤ g ( ˜ w ( j ) ) , j ∈ [ N S ] , with F ( w ) , g ( w ) defined in (7) , were constructed, leadingto the candidate set S N S . Then, the corresponding Cheby-shev center was computed, and Algorithm 1 was appliedwith ε = 0 . , δ = 10 − , leading to N γ = 2 , .We note that, in this case, the solution of the optimizationproblem in (26) may be obtained by bisection on γ . Indeed,for given γ , checking if S N S ( γ ) ⊆ X ( w ( i ) ) amounts tosolving some simple linear programs.Two different situations were considered: a case wherethe number of inequalities is rather small N S = 100 , anda case where the complexity of the SAS is higher, i.e. N S = 1 , . The outcome procedure is illustrated in Fig-ure 3. We can observe that, for a small N S – Fig. 3(a)– the initial approximation is rather large (although itis contained in X ε , we remark that we do not have anyguarantee that this will happen). In this case, the prob-abilistic scaling returns γ = 0 . which is less thanone. This means that, in order to obtain a set fulfillingthe desired probabilistic guarantees, we need to shrink itaround its center. In the second case, for a larger numberof sampled inequalities – Fig. 3(b) - the initial set (thered one) is much smaller, and the scaling procedure in-flates the set by returning a value of γ greater than one,i.e. γ = 1 . . Note that choosing a larger number ofsamples for the computation of the initial set does notimply that the final set will be a better approximation ofthe ε -CCS.Finally, we compare this approach to the scenario-likeones discussed in Subsection 3.3. To this end, we alsodraw the approximation obtained by directly applying theLearning Theory bound (20) . Note that in this case, since n θ = 3 and n (cid:96) = 4 , we need to take N LT = 13 , sam-ples, corresponding to , linear inequalities. The re-sulting set is represented in Fig. 3(c). We point out that using this approximation i) the set is much more complex,since the number of involved inequalities is much larger,ii) the set is much smaller, hence providing a much moreconservative approximation of the ε -CCS. Hence, the en-suing chance-constrained optimization problem will becomputationally harder, and lead to a solution with alarger cost or even to an infeasible problem, in cases wherethe approximating set is too small. In this section, we propose a procedure in which theshape of the scalable SAS may be selected a-priori. Thiscorresponds to situations where the designer wants tohave full control in the final shape in terms of struc-ture and complexity. The main idea is to define so-called norm-based SAS of the form S p ( γ ) . = θ c ⊕ γH B sp (29)where B sp is a (cid:96) p -ball in R s , H ∈ R n θ ,s , with s ≥ n θ , is adesign matrix (not necessarily square), and γ is the scal-ing parameter. Note that when the matrix H is square(i.e. s = n θ ) and positive definite these sets belong to theclass of (cid:96) p -norm based sets originally introduced in [45].In particular, in case of (cid:96) norm, the sets are ellipsoids.This particular choice is the one studied in [14]. Here,we extend this approach to a much more general familyof sets, which encompasses for instance zonotopes , ob-tained by letting p = ∞ and s ≥ n θ . Zonotopes havebeen widely studied in geometry, and have found sev-eral applications in systems and control, in particularfor problems of state estimation and robust Model Pre-dictive Control, see e.g. [46]. We recall that the scaling factor γ ( w ) is defined as 0 if θ c (cid:54)∈ X ( w ) and as the largest value γ for which S p ( γ ) ⊆ X ( w ) otherwise. The following theorem, whose proof isreported in Appendix A.3, provides a direct and simpleway to compute in closed form the scaling factor for agiven candidate norm-based SAS. Theorem 2 (Scaling factor for norm-based SAS)
Given a norm-based SAS S ( γ ) as in (29) , and a realiza-tion w ∈ W , the scaling factor γ ( w ) can be computed as γ ( w ) = min (cid:96) ∈ [ n (cid:96) ] γ (cid:96) ( w ) , with γ (cid:96) ( w ) , (cid:96) ∈ [ n (cid:96) ] , given by γ (cid:96) ( w ) = if τ (cid:96) ( w ) < , ∞ if τ (cid:96) ( w ) ≥ and ρ (cid:96) ( w ) = 0 ,τ (cid:96) ( w ) ρ (cid:96) ( w ) if τ (cid:96) ( w ) ≥ and ρ (cid:96) ( w ) > , (30)10 a) S N S with N S = 100. → γ = 0 . S N S with N S = 1 , → γ = 1 . N LT = 52 , ε -CCS. Scaling procedure applied to a sampled-polytope with N S = 100(a) and N S = 1 ,
000 (b). The initial sets are depicted in red, the scaled ones in green. (c) Approximation obtained by directapplication of Lemma 1. Note that, in this latter case, to plot the set without out-of-memory errors a pruning procedure [44]of the 52 ,
044 linear inequalities was necessary. where τ (cid:96) ( w ) . = g (cid:96) ( w ) − f T(cid:96) ( w ) θ c and ρ (cid:96) ( w ) . = (cid:107) H T f (cid:96) ( w ) (cid:107) p ∗ ,with (cid:107) · (cid:107) ∗ p being the dual norm of (cid:107) · (cid:107) p . Note that γ ( w ) is equal to zero if and only if θ c is notincluded in the interior of X ( w ). Similarly to Section 5, we first draw a fixed number N S of “design” uncertainty samples { ˜ w (1) , . . . , ˜ w ( N S ) } , andconstruct an initial sampled approximation by introduc-ing the following sampled-polytope SAS S N S as definedin (28). Again, we consider the Chebyshev center of S N S ,or its analytical center as a possible center θ c for ourapproach.Given S N S , s ≥ n θ and p ∈ { , , ∞} , the objective isto compute the largest set θ c ⊕ H B sp included in S N S .To this end, we assume that we have a function Vol p ( H )that provides a measure of the size of H B sp . That is,larger values of Vol p ( H ) are obtained for increasing sizesof H B sp . Remark 3 (On the volume function)
The function
Vol p ( H ) may be seen as a generalization ofthe classical concept of Lebesgue volume of the set S N S .Indeed, when H is a square positive definite matrix, somepossibilities are Vol p ( H ) = log det( H ) – which is di-rectly proportional to the classical volume definition, or Vol p ( H ) = tr H – which for p = 2 becomes the well knownsum of ellipsoid semiaxes (see [47] and [43, Chapter 8]).These measures can be easily generalized to non squarematrices. It suffices to compute the singular value de-composition. If H = U Σ V T , we could use the measures Vol p ( H ) = tr Σ or Vol p ( H ) = log det(Σ) . For non square matrices H , specific results for particu-lar values of p are known. For example, we remind thatif p = ∞ and H ∈ R n θ × s , s ≥ n θ , then θ c ⊕ H B s ∞ isa zonotope. Then, if we denote as generator each of thecolumns of H , the volume of a zonotope can be computedby means of a sum of terms (one for each different wayof selecting n θ generators out of the s generators of H );see [48], [49]. Another possible measure of the size of azonotope θ c ⊕ H B s ∞ is the Frobenious norm of H [48]. Given an initial design set S N S , we elect as our candidateScalable SAS the largest “volume” norm-based SAS con-tained in S N S . Formally, this rewrites as the followingoptimization problemmax θ c ,H Vol p ( H )subject to θ c ⊕ H B sp ⊆ S N S As it has been shown, this problem is equivalent tomin θ c ,H − Vol p ( H )s.t. f T(cid:96) ( ˜ w ( j ) ) θ c + (cid:107) H T f (cid:96) ( w ( j ) ) (cid:107) p ∗ − g (cid:96) ( w ( j ) ) ≤ ,(cid:96) ∈ [ n (cid:96) ] , j ∈ [ N S ] , where we have replaced the maximization of Vol p ( H )with the minimization of - Vol p ( H ).We notice that the constraints are convex on the deci-sion variables; also, the functional to minimize is con-vex under particular assumptions. For example when H is assumed to be square and positive definite and Vol p ( H ) = log det( H ). For non square matrices, the con-straints remain convex, but the convexity of the func-tional to be minimized is often lost. In this case, local11ptimization algorithms should be employed to obtain apossibly sub-optimal solution. Example 4 (Norm-based SAS)
We revisit again Ex-ample 2 to show the use of norm-based SAS. We notethat, in this case, the designer can control the approx-imation outcome by acting upon the number of designsamples N S used for constructing the set S N S . In Fig-ure 4 we report two different norm-based SAS, respec-tively with p = 1 and p = ∞ , and for each of them we con-sider two different values of N S , respectively N S = 100 and N S = 1 , . Similarly to what observed for thesampled-polys, we see that for larger N S , the ensuing ini-tial set becomes smaller. Consequently, we have an in-flating process for small N S and a shrinkage one for large N S However, we observe that in this case, the final num-ber of inequalities is independent on N S , being equal to n θ + 1 = 10 for S and n θ for S ∞ .6.2.1 Relaxed computation It is worth remarking that that the minimization prob-lem of the previous subsection might be infeasible. Inorder to guarantee the feasibility of the problem, a soft-constrained optimization problem is proposed. With arelaxed formulation, θ c is not guaranteed to satisfy allthe sampled constraints. However θ c ∈ S N S is not nec-essary to obtain an ε -CSS (in many practical applica-tions, every element of Θ has a non zero probability ofviolation and S N S is empty with non-zero probability).Moreover, a relaxed formulation is necessary to addressproblems in which there is no element of Θ with proba-bility of violation equal to zero (or significantly smallerthan ε ). Not considering the possibility of violations isan issue especially when N S is large, because the proba-bility of obtaining an empty sampled set S N S grows withthe number of samples N S .Given ξ > θ c ,H,τ ,...,τ NS − Vol p ( H ) + ξ N S (cid:88) j =1 max { τ j , } (31)s.t. f T(cid:96) ( w ( j ) ) θ c + (cid:107) H T f (cid:96) ( w ( j ) ) (cid:107) p ∗ − g (cid:96) ( w ( j ) ) ≤ τ j ,(cid:96) ∈ [ n (cid:96) ] , j ∈ [ N S ] . The parameter ξ serves to provide an appropriate tradeoff between satisfaction of the sampled constraints andthe size of the obtained region. A possibility to choose ξ would be to choose it in such a way that the fractionof violations n viol /N S (where n viol is the number of ele-ments τ j larger than zero) is smaller than ε/ We now present a numerical example in which the re-sults of the paper are applied to the probabilistic setmembership estimation problem, introduced in subSec-tion 2.3. We consider the universal approximation func-tions given by Gaussian radial basis function networks(RBFN) [50].Given the nodes [ x , x , . . . , x M ] and the variance pa-rameter c , the corresponding Gaussian radial basis func-tion network is defined asRBFN( x, θ ) = θ T ϕ ( x ) , where θ = (cid:104) θ . . . θ M (cid:105) T represents the weights and ϕ ( x ) = (cid:104) exp (cid:16) −(cid:107) x − x (cid:107) c (cid:17) . . . exp (cid:16) −(cid:107) x − x M (cid:107) c (cid:17) (cid:105) T is the regressor function. Given δ ∈ (0 ,
1) and ε ∈ (0 , − δ , an inner approximation of the probabilisticfeasible parameter set FPS ε , which is the set of param-eters θ ∈ R M that satisfies Pr W {| y − θ T ϕ ( x ) | ≤ ρ } ≥ − ε, (32)where x is a random scalar with uniform distribution in[ − ,
5] and y = sin(3 x ) + σ, where σ is a random scalar with a normal distributionwith mean 5 and variance 1.We use the procedure detailed in Sections 4, 5 and 6 toobtain an SAS of FPS ε . We have taken a grid of M =20 points in the interval [ − ,
5] to serve as nodes forthe RBFN, and a variance parameter of c = 0 .
15. Wehave taken N S = 350 random samples w = ( x, y ) tocompute the initial geometry, which has been chosento be an (cid:96) ∞ norm-based SAS of dimension 20 with arelaxation parameter of ξ = 1 (see (31)). The choseninitial geometry is θ c ⊕ H B ∞ , where H is constrained tobe a diagonal matrix.When the initial geometry is obtained, we scale it aroundits center by means of probabilistic scaling with Algo-rithm 1. The number of samples required for the scalingphase to achieve ε = 0 .
05 and δ = 10 − is N γ = 2065and the resulting scaling factor is γ = 0 . θ c ⊕ γH B ∞ is, with a probability no smallerthan 1 − δ , an inner approximation of FPS ε which we willrefer to as FPS δε . Since it is a transformation of an (cid:96) ∞ norm ball with a diagonal matrix H , we can write it as FPS δε = { θ : θ − ≤ θ ≤ θ + } , a) γ = 0 . γ = 1 . γ = 0 . γ = 1 . S -SAS with N S = 100, (b) S -SAS with N S = 1 ,
000 (b), S ∞ -SAS with N S = 100(c), and (cid:96) ∞ -poly with N S = 1 ,
000 (d). The initial set is depicted in red, the final one in green. The sampled design polytope S N S is represented in black. where the extreme values θ − , θ + ∈ R are representedin Figure 5 [51], along with the central value θ c ∈ R . Fig. 5. Representation of the extreme values θ + and θ − andthe central value θ c of the FPS δε . Once the
FPS δε has been computed, we can use its center θ c to make the point estimation y ≈ θ Tc ϕ ( x ). We canalso obtain probabilistic upper and lower bounds of y by means of equation (32). That is, every point in FPS δε satisfies, with confidence 1 − δ : Pr W { y ≤ θ T ϕ ( x ) + ρ } ≥ − ε, Pr W { y ≥ θ T ϕ ( x ) − ρ } ≥ − ε. (33)We notice that the tightest probabilistic bounds are ob-tained with θ + for the lower bound and θ − for the up-per one. That is, we finally obtain that, with confidence1 − δ : Pr W { y ≤ θ − T ϕ ( x ) + ρ } ≥ − ε, Pr W { y ≥ θ + T ϕ ( x ) − ρ } ≥ − ε. (34)Figure 6 shows the results of both the point estimationand the probabilistic interval estimation. x -5 -4 -3 -2 -1 0 1 2 3 4 5 y Fig. 6. Real values of y vs central estimation (blue) andinterval prediction bounds (red). In this paper, we proposed a general approach to con-struct probabilistically guaranteed inner approxima-tions of the chance-constraint set X ε . The approach isvery general and flexible.First, we remark that the proposed scaling approach isnot limited to sets defined by linear inequalities, butimmediately extends to more general sets. Indeed, wemay consider a generic binary performance function φ :13 × W → { , } defined as φ ( θ, q ) = (cid:40) θ meets design specifications for w Viol ( θ ) . = Pr W { ψ ( θ, w ) = 1 } = E ( θ ), and we can stilldefine the set X ε as in (5). Then, given an initial SAScandidate, Algorithm 1 still provides a valid approxima-tion. However, it should be remarked that, even if wechoose a “nice” SAS as those previously introduced, thenonconvexity of φ will most probably render step 4 ofthe algorithm intractable. To further elaborate on thispoint, let us focus on the case when the design specifica-tion may be expressed as a (nonlinear) inequality of theform ψ ( θ, q ) ≤ . Then, step 4 consist in solving the following nonconvexoptimization problem γ i . = arg max γ (36)s.t. S ( γ ) ⊆ X ( w ( i ) ) = (cid:110) θ ∈ Θ | ψ ( θ, w ( i ) ) ≤ (cid:111) . We note that this is general a possibly hard problem.However, there are cases when this problem is still solv-able. For instance, whenever ψ ( θ, q ) is a convex functionof θ for fixed w and the set S is also convex, the aboveoptimization problem may be formulated as a convexprogram by application of Finsler lemma. We remarkthat, in such situations, the approach proposed here isstill completely viable, since all the derivations continueto hold.Second, we remark that the paper open the way to thedesign of other families of Scaling SAS. For instance, weare currently working on using the family of sets definedin the form of polynomial superlevel sets (PSS) proposedin [52]. A Appendix
A.1 Proof of Lemma 1
To prove the lemma, we first recall the following defini-tion from [41].
Definition 4 ( ( α, k ) -Boolean Function) The func-tion h : Θ × W → R is an ( α, k ) -Boolean functionif for fixed w it can be written as an expression con-sisting of Boolean operators involving k polynomials Clearly, this formulation encompasses the setup discussed,obtained by simply setting φ ( θ, w ) = F ( w ) θ ≤ g ( w )1 otherwise. p ( θ ) , p ( θ ) , . . . , p k ( θ ) , in the components θ i , i ∈ [ n θ ] and the degree with respect to θ i of all these polynomialsis no larger than α . Let us now define the binary functions h (cid:96) ( θ, w ) . = (cid:40) f (cid:96) ( w ) θ ≤ g (cid:96) ( w )1 otherwise , (cid:96) ∈ [ n (cid:96) ] . Introducing the function h ( θ, w ) . = max (cid:96) =1 ,...,n (cid:96) h (cid:96) ( θ, w ) , wesee that the violation probability can be alternativelywritten as Viol ( θ ) . = Pr W { h ( θ, w ) = 1 } . The proof im-mediately follows by observing that h ( θ, w ) is an (1 , n (cid:96) )-Boolean function, since it can be expressed as a functionof n (cid:96) Boolean functions, each of them involving a poly-nomial of degree 1. Indeed, it is proven in [41, Theorem8], that, if h : Θ × W → R is an ( α, k )-Boolean functionthen, for ε ∈ (0 , . − δ we have Pr W { h ( θ, w ) = 1 } ≤ ε if N is chosen such that N ≥ . ε (cid:16) ln 21 . δ + 4 . n θ log (cid:16) eαkε (cid:17)(cid:17) . A.2 Property 1
Property 1
Given ε ∈ (0 , , δ ∈ (0 , , and ≤ r ≤ N , let N be such that B ( r ; N, ε ) ≤ δ . Draw N iid sample-sets { X (1) , X (2) , . . . , X ( N ) } from a distribution Pr X . For i ∈ [ N ] , let γ i . = γ ( X ( i ) ) , with γ ( · ) as in Definition 3, andsuppose that ¯ γ = γ r : N > . Then, with probability nosmaller than − δ , it holds that Pr X { θ c ⊕ ¯ γ S (cid:54)⊆ X } ≤ ε .Proof: It has been proven in [38,39] that if one discardsno more than r constraints on a convex problem with N random constraints, then the probability of violatingthe constraints with the solution obtained from the ran-dom convex problem is no larger than ε ∈ (0 , − δ , where δ = (cid:32) d + r − d − (cid:33) d + r − (cid:88) i =0 (cid:32) Ni (cid:33) ε i (1 − ε ) N − i , and d is the number of decision variables. We apply thisresult to the following optimization problemmax γ γ subject to θ c ⊕ γ S ⊆ X ( i ) , i ∈ [ N ] . From Definition 3, we could rewrite this optimizationproblem asmax γ γ subject to γ ≤ γ ( X ( i ) ) , i ∈ [ N ] . We first notice that the problem under considerationis convex and has a unique scalar decision variable γ .14hat is, d = 1. Also, the non-degeneracy and uniquenessassumption required in the application of the results of[38] and [39] are satisfied. Hence, if we allow r violationsin the above minimization problem, we have that withprobability no smaller than 1 − δ , where δ = (cid:32) r (cid:33) r (cid:88) i =0 (cid:32) Ni (cid:33) ε i (1 − ε ) N − i = B ( r ; N, ε ) , the solution ¯ γ of problem (A.1) satisfies Pr X { ¯ γ >γ ( X ) } ≤ ε. We conclude from this, and Definition 3,that with probability no smaller than 1 − δ , Pr X { θ c ⊕ ¯ γ S (cid:54)⊆ X } ≤ ε. Finally, note that the optimization problem under con-sideration can be solved directly by ordering the values γ i = γ ( X ( i ) ). It is clear that if r ≥ γ is ¯ γ = γ r +1: N . A.3 Proof of Theorem 2
Note that, by definition, the condition θ c ⊕ γH B sp ⊆ X ( w ) is equivalent tomax z ∈ B sp f T(cid:96) ( w )( θ c + γHz ) − g (cid:96) ( w ) ≤ , (cid:96) ∈ [ n (cid:96) ] . Equivalently, from the dual norm definition, we have f T(cid:96) ( w ) θ c + γ (cid:107) H T f (cid:96) ( w ) (cid:107) p ∗ − g (cid:96) ( w ) ≤ , (cid:96) ∈ [ n (cid:96) ] . Denote by γ (cid:96) the scaling factor γ (cid:96) corresponding to the (cid:96) -th constraint f T(cid:96) ( w ) θ c + γ (cid:96) (cid:107) H T f (cid:96) ( w ) (cid:107) p ∗ − g (cid:96) ( w ) ≤ . With the notation introduced in the Lemma, this con-straint rewrites as γ (cid:96) ρ (cid:96) ( w ) ≤ τ (cid:96) ( w ) . The result follows noting that the corresponding scalingfactor γ (cid:96) ( w ) can be computed as γ (cid:96) ( w ) = max γ (cid:96) ρ (cid:96) ( w ) ≤ τ (cid:96) ( w ) γ (cid:96) , and that the value for γ ( w ) is obtained from the mostrestrictive one. References [1] A. Pr´ekopa,
Stochastic Programming . Springer Science &Business Media, 2013. [2] N. V. Sahinidis, “Optimization under uncertainty: state-of-the-art and opportunities,”
Computers & ChemicalEngineering , vol. 28, no. 6-7, pp. 971–983, 2004.[3] A. Ben-Tal and A. Nemirovski, “Robustconvex optimization,”
Mathematics of Operations Research ,vol. 23, pp. 769–805, 1998.[4] G. Calafiore, F. Dabbene, and R. Tempo, “Researchon probabilistic methods for control system design,”
Automatica , vol. 47, pp. 1279–1293, 2011.[5] R. Tempo, G. Calafiore, and F. Dabbene,
RandomizedAlgorithms for Analysis and Control of Uncertain Systems:with Applications . Springer Science & Business Media, 2012.[6] M. Mammarella, E. Capello, F. Dabbene, and G. Guglieri,“Sample-based SMPC for tracking control of fixed-wingUAV,”
IEEE Control Systems Letters , vol. 2, no. 4, pp. 611–616, 2018.[7] J. Li, W. Zhan, Y. Hu, and M. Tomizuka, “Generic trackingand probabilistic prediction framework and its applicationin autonomous driving,”
IEEE Transactions on IntelligentTransportation Systems , 2019.[8] M. Chamanbaz, F. Dabbene, and C. Lagoa,
Algorithmsfor Optimal AC Power Flow in the Presence of RenewableSources . Wiley Encyclopedia of Electrical and ElectronicsEngineering, 2020, pp. 1–13.[9] M. Chamanbaz, F. Dabbene, and C. M. Lagoa,“Probabilistically robust AC optimal power flow,”
IEEETransactions on Control of Network Systems , vol. 6, no. 3,pp. 1135–1147, 2019.[10] X. Geng and L. Xie, “Data-driven decision making inpower systems with probabilistic guarantees: Theory andapplications of chance-constrained optimization,”
AnnualReviews in Control , vol. 47, pp. 341–363, 2019.[11] M. Lorenzen, F. Dabbene, R. Tempo, and F. Allg¨ower,“Stochastic MPC with offline uncertainty sampling,”
Automatica , vol. 81, no. 1, pp. 176–183, 2017.[12] M. Mammarella, M. Lorenzen, E. Capello, H. Park,F. Dabbene, G. Guglieri, M. Romano, and F. Allg¨ower,“An offline-sampling SMPC framework with applicationto autonomous space maneuvers,”
IEEE Transactions onControl Systems Technology , pp. 1–15, 2018.[13] T. Alamo, V. Mirasierra, F. Dabbene, and M. Lorenzen, “Safeapproximations of chance constrained sets by probabilisticscaling,” in .IEEE, 2019, pp. 1380–1385.[14] M. Mammarella, T. Alamo, F. Dabbene, and M. Lorenzen,“Computationally efficient stochastic mpc: a probabilisticscaling approach,” in
Proc. of 4th IEEE Conference onControl Technology and Applications , 2020.[15] M. Ahsanullah, V. Nevzorov, and M. Shakil,
An introductionto Order Statistics . Paris: Atlantis Press, 2013.[16] B. Miller and H. Wagner, “Chance constrained programmingwith joint constraints,”
Operations Research , vol. 13, pp.930–945, 1965.[17] L. Khachiyan, “The problem of calculating the volume ofa polyhedron is enumerably hard,”
Russian MathematicalSurveys , 1989.[18] A. Shapiro, D. Dentcheva, and A. Ruszczy´nski,
Lectures onstochastic programming: modeling and theory . SIAM, 2014.[19] W. van Ackooij, “Eventual convexity of chance constrainedfeasible sets,”
Optimization , vol. 64, no. 5, pp. 1263–1284,2015.
20] A. Pr´ekopa, T. Rapcs´ak, and I. Zsuffa, “Serially linkedreservoir system design using stochastic programing,”
WaterResources Research , vol. 14, no. 4, 1978.[21] D. Dentcheva, B. Lai, and A. Ruszczy´nski, “Dual methodsfor probabilistic optimization problems*,”
MathematicalMethods of Operations Research , vol. 60, no. 2, pp. 331–346,2004.[22] M. Lorenzen, F. Dabbene, R. Tempo, and F. Allg¨ower,“Constraint-tightening and stability in stochastic modelpredictive control,”
IEEE Transactions on AutomaticControl , vol. 62, no. 7, pp. 3165–3177, 2017.[23] A. Vicino and G. Zappa, “Sequential approximation offeasible parameter sets for identification with set membershipuncertainty,”
IEEE Transactions on Automatic Control ,vol. 41, no. 6, pp. 774–785, 1996.[24] J. M. Bravo, T. Alamo, and E. F. Camacho, “Boundederror identification of systems with time-varying parameters,”
IEEE Transactions on Automatic Control , vol. 51, no. 7, pp.1144–1150, 2006.[25] S. Kataoka, “A stochastic programmingmodel,”
Econometrica: Journal of the Econometric Society ,pp. 181–196, 1963.[26] A. Pr´ekopa, “Logarithmic concavemeasures with application to stochastic programming,”
ActaScientiarum Mathematicarum , pp. 301–316, 1971.[27] C. M. Lagoa, “On the convexity of probabilisticallyconstrained linear programs,” in
Proceedings of the 38th IEEEConference on Decision and Control (Cat. No.99CH36304) ,vol. 1, 1999, pp. 516–521 vol.1.[28] G. C. Calafiore and L. E. Ghaoui, “On distributionallyrobust chance-constrained linear programs,”
Journal ofOptimization Theory and Applications , vol. 130, no. 1, pp.1–22, 2006.[29] R. Henrion and C. Strugarek, “Convexityof chance constraints with independent random variables,”
Computational Optimization and Applications , vol. 41, no. 2,pp. 263–276, 2008.[30] L. Hewing and M. N. Zeilinger, “Stochastic model predictivecontrol for linear systems using probabilistic reachable sets,”in ,2018, pp. 5182–5188.[31] S. Yan, P. Goulart, and M. Cannon, “Stochastic modelpredictive control with discounted probabilistic constraints,”in . IEEE, 2018,pp. 1003–1008.[32] A. Nemirovski and A. Shapiro, “Convex approximationsof chance constrained programs,”
SIAM Journal onOptimization , vol. 17, no. 4, pp. 969–996, 2006.[33] W. Chen, M. Sim, J. Sun, and C.-P. Teo, “From CVaRto uncertainty set: Implications in joint chance-constrainedoptimization,”
Operations Research , vol. 58, no. 2, pp. 470–485, 2010.[34] A. Jasour, N. S. Aybat, and C. M. Lagoa, “Semidefiniteprogramming for chance constrained optimization oversemialgebraic sets,”
SIAM Journal on Optimization , vol. 25,no. 3, pp. 1411–1440, 2015.[35] J. B. Lasserre, “Representation of chance-constraints withstrong asymptotic guarantees,”
IEEE Control SystemsLetters , vol. 1, no. 1, pp. 50–55, 2017.[36] G. Calafiore and M. Campi, “The scenario approach to robustcontrol design,”
IEEE Transactions on Automatic Control ,vol. 51, no. 5, pp. 742–753, 2006. [37] M. Campi and S. Garatti, “The exact feasibility ofrandomized solutions of robust convex programs,”
SIAMJournal of Optimization , vol. 19, pp. 1211—1230, 2008.[38] G. Calafiore, “Random convex programs,”
SIAM Journal ofOptimization , vol. 20, pp. 3427–3464, 2010.[39] M. Campi and S. Garatti, “A sampling-and-discardingapproach to chance-constrained optimization: feasibilityand optimality,”
Journal of Optimization Theory andApplications , vol. 148, pp. 257–280, 2011.[40] V. Vapnik,
Statistical Learning Theory . New York: JohnWiley and Sons, 1998.[41] T. Alamo, R. Tempo, and E. F. Camacho, “Randomizedstrategies for probabilistic solutions of uncertain feasibilityand optimization problems,”
IEEE Transactions onAutomatic Control , vol. 54, no. 11, pp. 2545–2559, 2009.[42] T. Alamo, R. Tempo, A. Luque, and D. Ramirez,“Randomized methods for design of uncertain systems:Sample complexity and sequential algorithms,”
Automatica ,vol. 52, pp. 160–172, 2015.[43] S. Boyd and L. Vandenberghe,
Convex Optimization .Cambridge University Press, 2004.[44] M. Herceg, M. Kvasnica, C. N. Jones, and M. Morari, “Multi-parametric toolbox 3.0,” in . IEEE, 2013, pp. 502–510.[45] F. Dabbene, C. Lagoa, and P. Shcherbakov, “On thecomplexity of randomized approximations of nonconvexsets,” in . IEEE, 2010, pp. 1564–1569.[46] V. T. H. Le, C. Stoica, T. Alamo, E. F. Camacho, andD. Dumur,
Zonotopes: From Guaranteed State-estimation toControl . Wiley, 2013.[47] F. Dabbene, D. Henrion, C. Lagoa, and P. Shcherbakov,“Randomized approximations of the image set ofnonlinear mappings with applications to filtering,”
IFAC-PapersOnLine , vol. 48, no. 14, pp. 37–42, 2015.[48] T. Alamo, J. M. Bravo, and E. F. Camacho, “Guaranteedstate estimation by zonotopes,”
Automatica , vol. 41, no. 6,pp. 1035–1043, 2005.[49] E. Gover and N. Krikorian, “Determinants and the volumesof parallelotopes and zonotopes,”
Linear Algebra and itsApplications , vol. 433, no. 1, pp. 28–40, 2010.[50] M. D. Buhmann, “Radial basis functions,”
Acta numerica ,vol. 9, pp. 1–38, 2000.[51] L. J, “Plotrix: a package in the red light district of r,”
R-News , vol. 6, no. 4, pp. 8–12, 2006.[52] F. Dabbene, D. Henrion, and C. M. Lagoa, “Simpleapproximations of semialgebraic sets and their applicationsto control,”
Automatica , vol. 78, pp. 110 – 118, 2017., vol. 78, pp. 110 – 118, 2017.