[PDF] A Behavioral Input-Output Parametrization of Control Policies with Suboptimality Guarantees

Abstract

Recent work in data-driven control has revived behavioral theory to perform a variety of complex control tasks, by directly plugging libraries of past input-output trajectories into optimal control problems. Despite recent advances, a key aspect remains unclear: how and to what extent do noise-corrupted data impact control performance? In this work, we provide a quantitative answer to this question. We formulate a Behavioral version of the Input-Output Parametrization (BIOP) for the optimal predictive control of unknown systems using output-feedback dynamic control policies. The main advantages of the proposed framework are that 1) the state-space parameters and the initial state need not be specified for controller synthesis, 2) it can be used in combination with state-of-the-art impulse response estimators, and 3) it allows to recover suboptimality results on learning the Linear Quadratic Gaussian (LQG) controller, therefore revealing, in a quantitative way, how the level of noise in the data affects the performance of behavioral methods. Specifically, it is shown that the performance degrades linearly with the prediction error of the behavioral model. We conclude the paper with numerical experiments to validate our results.

Full PDF

11 A Behavioral Input-Output Parametrization of Control Policies withSuboptimality Guarantees

Luca Furieri, Baiwei Guo ∗ , Andrea Martin ∗† , and Giancarlo Ferrari-Trecate Abstract

Recent work in data-driven control has revived behavioral theory to perform a variety of complex control tasks, by directlyplugging libraries of past input-output trajectories into optimal control problems. Despite recent advances, a key aspect remainsunclear: how and to what extent do noise-corrupted data impact the achieved control performance? In this work, we provide aquantitative answer to these questions. We formulate a Behavioral version of the Input-Output Parametrization (BIOP) for thepredictive control of unknown systems using output-feedback dynamic control policies. The main advantages of the proposedframework are that 1) the state-space parameters and the initial state need not be speciﬁed for controller synthesis, 2) it can beused in combination with state-of-the-art impulse response estimators, and 3) it allows to recover recent suboptimality resultsfor the Linear Quadratic Gaussian (LQG) control problem, therefore revealing, in a quantitative way, how the level of noise inthe data affects the performance of behavioral methods. Speciﬁcally, it is shown that the performance degrades linearly with theprediction error of a behavioral model. We conclude the paper with numerical experiments to validate our results.

I. I

NTRODUCTION

Several safety-critical engineering systems that play a crucial role in our modern society are becoming too complex to beaccurately modeled through white-box models [1]. As a consequence, most modern control perspectives envision unknownblack-box systems for which an optimal behavior must be attained by solely relying on a collection of historical system’soutput trajectories in response to different inputs.Widely speaking, we can learn optimal controllers from data according to two paradigms. The ﬁrst category contains model-based methods, where historical input-output trajectories are exploited to approximate the system parameters, and a suitablecontroller is computed for this estimated model. The second category contains model-free methods, where one aims to learnthe best control policy directly by observing historical trajectories, without explicitly reconstructing an internal representationof the dynamical system. Both approaches possess their own potential and limitations; among numerous recent surveys, werefer to [2].Given the intricacy of establishing rigorous suboptimality and sample-complexity bounds, most recent model-based andmodel-free approaches have focused on basic Linear Quadratic Regulator (LQR) and Linear Quadratic Gaussian (LQG) controlproblems as suitable benchmarks to establish how machine learning can be interfaced to the continuous action spaces typicalof control [3]–[10]. When it comes to complex tasks, such as constrained and distributed control, it is more challenging toperform a rigorous probabilistic analysis. Recent advances include [11], [12] for constrained and distributed LQR control withdirect state measurements and [13] for distributed output-feedback LQG.A promising data-driven approach that aims at bypassing a parametric description of the system dynamics, while still beingconceptually simple to implement for users, hinges on the behavioral framework [14]. This approach has gained renewedinterest with the introduction of Data-EnablEd Predictive Control [15], where the authors established that constrained outputreference tracking can be effectively tackled in an MPC-like way by plugging adequately generated historical data into aconvex optimization problem. Further work [16] derived links with distributionally robust programming, and novel connectionsbetween the behavioral perspective, system identiﬁcation and subspace predictive control were established in [17]. In parallel,[18] introduced data-driven formulations for some controller design tasks. These works inspired several extensions includingclosed-loop control with stability guarantees [19], maximum-likelihood identiﬁcation and control [20], [21], and nonlinearvariants [22].Predictive behavioral approaches have shown remarkable performance for complex, even nonlinear control tasks [15], [17],[21], [22]. In practice, however, historical data are corrupted by noise and the quality and coherency of the achieved solutionsmay be compromised. While several promising approaches have recently been proposed, including distributionally robustformulations [16], data-enabled Kalman ﬁltering [23] and non-parametric maximum-likelihood estimation [21], a completequantitative analysis for the noisy case is still unavailable. Recently, the authors of [24] have derived suboptimality andsample-complexity bounds establishing a behavioral formulation of the System Level Synthesis (SLS) approach. However, astrong assumption in [24] is that the internal system states can be measured directly.

Authors are with the Institute of Mechanical Engineering, ´Ecole Polytechnique F´ed´erale de Lausanne, Switzerland. E-mails: { luca.furieri,baiwei.guo, andrea.martin, giancarlo.ferraritrecate } @epfl.ch † Andrea Martin is also with the Automatic Control Laboratory, Department of Information Technology and Electrical Engineering, ETH Z¨urich, Switzerland. ∗ Baiwei Guo and Andrea Martin contributed equally to this work.Research supported by the Swiss National Science Foundation under the NCCR Automation (grant agreement 51NF40 80545). a r X i v : . [ ee ss . S Y ] F e b Our main contribution is to propose a behavioral optimal control framework for partially observed systems. Speciﬁcally,we leverage recent Input-Output Parametrization (IOP) tools [25] for optimal output-feedback controller design and set upa data-driven formulation built upon behavioral theory; we denote the resulting framework as Behavioral IOP (BIOP). Theadvantages of the proposed BIOP are threefold. First, it solely relies on libraries of past input-output trajectories, thereforeenabling optimal controller synthesis without specifying the system’s state-space parameters and the value of the state at time . Second, the system impulse response is replaced by a suitable linear combination of historical noisy input-output trajectories,which may encompass, for instance, standard least-squares solutions [26], data-enabled Kalman ﬁltering [23], and the recentlyproposed signal matrix models (SMM) [20], [21]. Third, our framework allows one to quantify the incurred suboptimality asa function of the level of the noise corrupting the available data; this is achieved by adapting recent results from [6]. As aconsequence, we endow behavioral approaches with rigorous analysis tools that have been recently utilized in more classicalcontrol contexts [3], [6]. To the best of our knowledge, noise-dependent suboptimality guarantees on using behavioral theoryfor output-feedback control have not been established before. A. Notation

We use R and N to denote real numbers and non-negative integers, respectively. We use I n to denote the identity matrixof size n × n and m × n to denote the zero matrix of size m × n . We write M = blkdg ( M , . . . , M n ) to denote a block-diagonal matrix with M , . . . , M n on its diagonal block entries. The Kronecker product between M ∈ R m × n and P ∈ R p × q is denoted as M ⊗ P ∈ R mp × nq . Given K ∈ R m × n , vec ( K ) ∈ R mn is a column vector that stacks the columns of K . TheEuclidean norm of a vector v ∈ R n is denoted by (cid:107) v (cid:107) = v T v and the induced two-norm of a matrix M ∈ R m × n is deﬁnedas sup (cid:107) x (cid:107) =1 (cid:107) M x (cid:107) . The Frobenius norm of a matrix M ∈ R m × n is denoted by (cid:107) M (cid:107) F = (cid:112) Trace ( M T M ) . We recall thefollowing standard inequalities for matrices M, N and vectors v of compatible dimensions, which will be used throughout thepaper:1) (cid:107) M N (cid:107) ≤ (cid:107) M (cid:107) (cid:107) N (cid:107) ,2) (cid:107) M N (cid:107) F ≤ (cid:107) M (cid:107) F (cid:107) N (cid:107) ,3) (cid:107) M N (cid:107) F ≤ (cid:107) M (cid:107) (cid:107) N (cid:107) F ,4) (cid:107) v (cid:107) = (cid:107) v (cid:107) F .For a symmetric matrix M , we write M (cid:31) (resp. M (cid:23) ) if and only if it is positive deﬁnite (resp. positive semideﬁnite).We say that x ∼ N ( µ, Σ) if the random variable x ∈ R n is distributed according to a normal distribution with mean µ ∈ R n and covariance matrix Σ (cid:23) with Σ ∈ R n × n .We use upper case boldface letters ( e.g. x and G ) to denote trajectories of vectors in time, and linear maps between trajectoriesrespectively. Speciﬁcally, a ﬁnite-horizon trajectory of length T is a sequence ω (0) , ω (1) , · · · ω ( T − with ω ( t ) ∈ R n forevery t = 0 , , . . . , T − , which can be compactly written as ω [0 ,T − =  ω (0) ω (1) ... ω ( T −  ∈ R nT . For a a ﬁnite-horizon trajectory ω [0 ,T − we also deﬁne the Hankel matrix of depth L as H L ( ω [0 ,T − ) =  ω (0) ω (1) · · · ω ( T − L ) ω (1) ω (2) · · · ω ( T − L + 1) ... ... . . . ... ω ( L − ω ( L ) · · · ω ( T −  . II. P

ROBLEM S TATEMENT

We consider a linear system with output observations, whose state-space representation is given by x ( t + 1) = Ax ( t ) + Bu ( t ) ,y ( t ) = Cx ( t ) + v ( t ) , (1)where x ( t ) ∈ R n is the state of the system and x (0) = x for a predeﬁned x ∈ R n , u ( t ) ∈ R m is the control input, y ( t ) ∈ R p is the observed output, and v ( t ) ∈ R p denotes Gaussian measurement noise v ( t ) ∼ N (0 , Σ v ) , with Σ v (cid:31) . The system iscontrolled through a time-varying, dynamic linear control policy of the form u ( t ) = t (cid:88) k =0 K t,k y ( k ) + w ( t ) , (2) (cid:45)(cid:63)(cid:45) (cid:101) yv ++ (cid:101) (cid:27) (cid:54)(cid:27) wux ++ K B ++ (cid:27) (cid:101) (cid:27) z - (cid:27) (cid:45) A (cid:54) C G Fig. 1: Interconnection of the plant G and the controller K , where z − denotes the standard time-shift operator.where w ( t ) ∈ R m denotes Gaussian noise on the input w ( t ) ∼ N (0 , Σ w ) with Σ w (cid:23) . Similar to standard LQG, our controlgoal is to synthesize a feedback control policy that minimizes the expected value with respect to the disturbances of a quadraticobjective deﬁned over future input-output trajectories for a horizon N ∈ N : J := E w,v (cid:34) N − (cid:88) t =0 (cid:0) y ( t ) T L ( t ) y ( t ) + u ( t ) T R ( t ) u ( t ) (cid:1)(cid:35) , (3)where L ( t ) (cid:23) , R ( t ) (cid:31) for every t = 0 , · · · , N − . Remark 1.

The reader might have noticed that the problem of minimizing (3) for a system in the form (1) - (2) is slightly differentfrom some of the classical LQG formulations, see for instance [27]. Speciﬁcally, in (3) we penalize the outputs instead of thestates, and the input noise w ( t ) enters the state equation indirectly through the matrix B . This choice is motivated as follows: For all practical purposes, the cost function must be deﬁned by the user. In a data-driven setup where only input-outputsamples can be measured, the user has to evaluate the cost solely relying on input-output trajectories. Furthermore, todeﬁne the cost, it is natural to specify the variance of the noise affecting inputs and outputs; instead it would be lessmeaningful to specify the statistics of the noise entering the states, as these would be representation dependent (i.e. onlyspeciﬁed up to a change of variables z ( t ) = Sx ( t ) , where S is unknown because we do not have access to x ( t ) byassumption). For zero initial state, the system (1) is equivalent to a classical transfer function representation as per Figure 1. Theconsidered noise model is indeed the standard choice in closed-loop plant H norm minimization, see for instance [28]. Remark 2.

In this work, we focus on solving and analyzing a ﬁnite-horizon control problem, which represents one iterationof a receding-horizon Model Predictive Control (MPC) implementation scheme. It is therefore appropriate to compare theproposed approach with a single iteration of the DeePC setup in [15], [19]. The main difference is that we perform closed-loop predictions , i.e., we optimize over feedback policies π ( · ) such that u ( t ) = π ( y ( t ) , . . . , y (0)) , while the DeePC [15], [19]performs open-loop predictions , i.e., it directly optimizes over input sequences u (0) , u (1) , u ( N − . For linear systems subjectto polytopic safety constraints, it is well-known that closed-loop predictions are less conservative than open-loop ones andallow for longer prediction horizons without incurring in infeasibility [29]. The price to pay for such performance improvementis an increased computational burden due to the larger dimensionality of the problem.A. Strongly convex design through the IOP By leveraging tools offered by the framework of the IOP [25], we formulate a strongly convex program that computesthe optimal feedback control policy by ﬁnding the optimal input-output closed-loop responses. The state-space equations (1)provide the following relations between trajectories x [0 ,N − = P A (: , x (0) + P B u [0 ,N − , y [0 ,N − = Cx [0 ,N − + v [0 ,N − , (4)where P A (: , denotes the ﬁrst block-column of P A and P A = ( I − ZA ) − , P B = ( I − ZA ) − ZB , A = I N ⊗ A , B = I N ⊗ B , C = I N ⊗ C , Z = (cid:20) n × n ( N − n × n I n ( N − n ( N − × n (cid:21) . We note that CP B is a Toeplitz matrix with blocks in the form CA i B . From now on, we equivalently denote G = CP B tohighlight that G contains the ﬁrst N components of the impulse-response of the plant G ( z ) = C ( zI − A ) − B reported in With Gaussian noise, dynamic linear policies are optimal for the cost deﬁned in (3).

Figure 1. Second, with similar reasoning, the matrix CP A (: , contains the observability terms CA i for i = 0 , . . . , N − .The control policy can be rewritten as: u [0 ,N − = Ky [0 ,N − + w [0 ,N − , (5)where the control policy K has a causal sparsity pattern, that is: K =  K , m × p · · · m × p K , K , . . . m × p ... ... . . . ... K N − , K N − , · · · K N − ,N −  . (6)Note that since we assume that y ( t ) is only a function of the inputs up to time t − , the optimal value of u ( N − is zero.Hence, the optimizer will always choose the last block-row of (6) to be null. Since the derivations are unaffected, we let thecorresponding decision variables in (6) be free for notational simplicity.By plugging the controller (5) into (4), it is easy to derive the closed-loop relations (cid:20) y [0 ,N − u [0 ,N − (cid:21) = (cid:20) ( I − GK ) − ( I − GK ) − GK ( I − GK ) − ( I − KG ) − (cid:21) (cid:20) v [0 ,N − + CP A (: , x (0) w [0 ,N − (cid:21) (7) = (cid:20) Φ yy Φ yu Φ uy Φ uu (cid:21) (cid:20) v [0 ,N − + CP A (: , x (0) w [0 ,N − (cid:21) . (8)The parameters ( Φ yy , Φ yu , Φ uy , Φ uu ) in (8) represent the four closed-loop responses deﬁning the relationship betweendisturbances and input-output signals.The main concept behind the IOP in [25] is that linear output-feedback control policies K can be expressed in termsof corresponding closed-loop responses that lie in an afﬁne subspace, hence enabling a convex formulation of the objective J ( G , K ) as a function of the closed-loop responses. The idea of optimizing over the closed-loop responses roots back toYoula-based and disturbance-feedback controller design [28], [30]. These concepts have been revisited in different contextsthrough the introduction of 1) system level synthesis [31] and 2) newer parametrizations, including the IOP [25]. All thesecontroller parametrizations are equivalently expressive; we refer the interested reader to [32] for a survey.The IOP serves well our purposes in a data-driven output-feedback setup, as it offers a controller parametrization that isdirectly deﬁned through the impulse response parameters G , without requiring a state-space representation. We recall thefollowing result from [25] and adapt it to the ﬁnite horizon case. A proof is reported in the Appendix for completeness. Proposition 1.

Consider the LTI system (1) evolving under the control policy (5) within a ﬁnite horizon of length N ∈ N .Then: For any controller K there exist four matrices ( Φ yy , Φ yu , Φ uy , Φ uu ) such that K = Φ uy Φ − yy and (cid:2) I − G (cid:3) (cid:20) Φ yy Φ yu Φ uy Φ uu (cid:21) = (cid:2) I (cid:3) , (9) (cid:20) Φ yy Φ yu Φ uy Φ uu (cid:21) (cid:20) − G I (cid:21) = (cid:20) I (cid:21) , (10) Φ yy , Φ uy , Φ yu , Φ uu have causal sparsities . (11)2) For any four matrices ( Φ yy , Φ yu , Φ uy , Φ uu ) lying in the afﬁne subspace (9) - (11) , the controller K = Φ uy Φ − yy is causalas per (6) and yields the closed-loop responses ( Φ yy , Φ yu , Φ uy , Φ uu ). We are now ready to establish a strongly convex formulation of the optimal control problem under study. Please refer to theAppendix for a complete proof.

Proposition 2.

Consider the LTI system (1) . The controller in the form (5) achieving the minimum of the cost functional (3) is given by K = Φ uy Φ − yy , where Φ uy , Φ yy are optimal solutions to the following strongly convex program: min Φ yy , Φ yu , Φ uy , Φ uu (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:20) L R (cid:21) (cid:20) Φ yy Φ yu Φ uy Φ uu (cid:21) (cid:34) Σ v CP A (: , x (0)0 Σ w (cid:35)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F (12) subject to (9) − (11) , where L = blkdiag ( L (0) , · · · , L ( N − , R = blkdiag ( R (0) , · · · , R ( N − , Σ v = I N ⊗ Σ v and Σ w = I N ⊗ Σ w . Speciﬁcally, they have the block lower-triangular sparsities resulting by construction from the expressions (7), the sparsity of K in (6) and that of G . When the system parameters ( A, B, C, x ) are known, it is straightforward and efﬁcient to compute the unique globallyoptimal solution ( Φ (cid:63)yy , Φ (cid:63)yu , Φ (cid:63)uy , Φ (cid:63)uu ) of problem (12) with off-the-shelf interior point solvers. The globally optimal controlpolicy is recovered as K (cid:63) = Φ (cid:63)uy ( Φ (cid:63)yy ) − . We also remark that, since the noise is Gaussian, the linear policy u = π (cid:63) ( y ) = K (cid:63) y is optimal with respect to all feedback policies. If the noise is non-Gaussian, K (cid:63) remains the optimal linear controller, butnonlinear policies may outperform it.However, it is more challenging to compute K (cid:63) merely relying on libraries of past input-output trajectories. In the nextsection, we exploit behavioral theory to provide a non-parametric version of (12).III. B EHAVIORAL I NPUT -O UTPUT P ARAMETRIZATION

Behavioral system theory [14], [33] offers a way of characterizing a dynamical system without resorting to a particularsystem representation, but rather exploiting the subspace of the signal space in which the trajectories of the system live. Beforemoving on, we recall the following deﬁnition of persistency of excitation and the result known as the

Fundamental Lemma for LTI systems [33].

Deﬁnition 1.

We say that u h [0 ,T − is persistently exciting (PE) of order L if the Hankel matrix H L ( u h [0 ,T − ) has full row-rank. A necessary condition for the matrix H L ( u h [0 ,T − ) to be full row-rank is that it has at least as many columns as rows. Itfollows that the historical trajectory must be long enough to satisfy T ≥ ( m + 1) L − . Lemma 1 (Theorem 3.7, [33] ) . Consider system (1) and assume that ( A, B ) is controllable and that there is no noise.Let { y h [0 ,T − , u h [0 ,T − } be a historical system trajectory of length T . Then, if u [0 ,T − is PE of order n + L , the signals y (cid:63) [0 ,L − ∈ R pL and u (cid:63) [0 ,L − ∈ R mL are valid trajectories of (1) if and only if there exists g ∈ R T − L +1 such that (cid:34) H L ( y h [0 ,T − ) H L ( u h [0 ,T − ) (cid:35) g = (cid:34) y (cid:63) [0 ,L − u (cid:63) [0 ,L − (cid:35) . (13)As pointed out in [17], controllability of the data-generating system and PE of the input signal constitute standard con-ditions that are only sufﬁcient to generate all input-output system trajectories through (13). A more general necessary andsufﬁcient condition can be given as follows [17]: for L ≥ l , with l deﬁned as in Assumption 4, and letting w h [0 ,T − = P col ( y h [0 ,T − , u h [0 ,T − ) , where P is a generic permutation matrix, we are able to span all system trajectories of length L bymeans of (13) if and only if rank ( H L ( w h [0 ,T − )) = mL + n .Next, we show how Lemma 1 can be directly exploited to obtain a non-parametric formulation of (12). We work under thefollowing assumptions. Assumption 1.

The data-generating LTI system (1) is such that ( A, B ) is controllable and ( A, C ) is observable. Assumption 2.

The following data are available: i) A recent system trajectory of length T ini : (cid:110) y r [0 ,T ini − , u r [0 ,T ini − (cid:111) , with y r [0 ,T ini − = y [ − T ini , − and u r [0 ,T ini − = u [ − T ini , − , corresponding to the trajectory in the immediate past that brought the system to its current initial state x (0) . ii) A historical system trajectory of length T : (cid:110) y h [0 ,T − , u h [0 ,T − (cid:111) , with y h [0 ,T − = y [ − T h , − T h + T − and u h [0 ,T − = u [ − T h , − T h + T − for T h ∈ N such that T h > T + T ini . Assumption 3.

The historical and recent data are not corrupted by noise.

Assumption 4.

The historical input trajectory u h [0 ,T − is persistently exciting of order n + T ini + N , where T ini ≥ l and l is the smallest integer such that (cid:2) C T ( CA ) T · · · ( CA l − ) T (cid:3) T , has full row-rank. Note that if Assumption 1 holds, then l ≤ n . A few comments are in order. First, Assumption 1 is without loss of generality, as from an input-output perspective we arenot concerned with the non-controllable and non-observable subsystems. Therefore, it is equivalent to assume that ( A, B, C ) are the matrices associated with the controllable and observable parts of the LTI system. Second, in Assumption 2 the historical data are needed to construct a non-parametric system representation, and the recent data are exploited to deﬁne a cost functionthat accurately reﬂects the system initial state x (0) ∈ R n . Third, in Assumption 3 we assume that the observed data arenoiseless so that we construct a non-parametric optimal control problem equivalent to (12). In the next section, we will dealwith historical and recent trajectories that are indeed corrupted by noise. Theorem 1 (Behavioral IOP) . Consider the unknown LTI system (1) and let Assumptions 1-4 hold. Let ( G, g ) be any coupleof solutions to the linear system of equations  U p Y p U f  (cid:2) G g (cid:3) =  mT ini × m u r [0 ,T ini − pT ini × m y r [0 ,T ini − col ( I m , m ( N − × m ) 0 mN ×  , (14) where (cid:20) U p U f (cid:21) = H T ini + N ( u h [0 ,T − ) and (cid:20) Y p Y f (cid:21) = H T ini + N ( y h [0 ,T − ) . Then, the optimization problem (12) is equivalent to min Φ yy , Φ yu , Φ uy , Φ uu (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:20) L R (cid:21) (cid:20) Φ yy Φ yu Φ uy Φ uu (cid:21) (cid:34) Σ v Y f g Σ w (cid:35)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F (15) subject to (cid:2) I − Y f G (cid:3) (cid:20) Φ yy Φ yu Φ uy Φ uu (cid:21) = (cid:2) I (cid:3) , (16) (cid:20) Φ yy Φ yu Φ uy Φ uu (cid:21) (cid:20) − Y f GI (cid:21) = (cid:20) I (cid:21) , (17) Φ yy , Φ uy , Φ yu , Φ uu have causal sparsities , (18) Proof.

In problem (12), the system parameters ( A, B, C, x (0)) appear through the terms G = CP B in the constraints and CP A x (0) in the cost. It is therefore sufﬁcient to show that we are able to substitute both elements with data as per the theoremstatement.Let G be any solution (14). By rearranging the terms, each column of G can be thought as a solution to (13) associated witha zero initial condition and a unitary input e i ∈ R m . Since the hypotheses of Lemma 1 are satisﬁed for L = T ini + N , similarto Proposition 11 of [34] we deduce that Y f G is the system impulse response matrix, independent of the solution G . Therefore,we can equivalently substitute G = Y f G in the constraints (9)-(10) of problem (12). Finally, note that Y f g corresponds tothe trajectory starting at x (0) (as implicitly deﬁned by the recent trajectory y [ − T ini , − and u [ − T ini , − ) when applying a zeroinput [34]. Therefore, it corresponds to the true free response starting from x (0) .For any solution G of the behavioral impulse response representation (14), the afﬁne constraints (9)-(11) describe all theachievable closed-loop responses for the unknown model and the corresponding controller K . Also, for any solution g of (14),the term Y f g represents the true free response of the system. As a result, the achieved optimal controller K (cid:63) and optimalcost J (cid:63) are independent of the chosen solution ( G, g ) for (14). We have thus characterized a data-driven version of the IOP.Theorem 1 further shows that, by exploiting the BIOP, it is straightforward to cast the LQG problem as a strongly convexprogram. Remark 3.

To use the language of [15], [17], [20], the proposed BIOP formulation belongs to the class of indirect , non-parametric data-driven controller synthesis methods enabled by behavioral theory. Indeed, the optimal feedback controller iscomputed in two phases, hence the adjective indirect . First, an impulse response matrix is obtained as part of an implicitidentiﬁcation step based on Willems’s fundamental lemma. Second, an optimal control problem is cast and solved by replacingthe impulse and free responses with a suitable linear combinations of historical input-output trajectories . The works in [18],[24], propose an alternative direct approach where a single, high-dimensional optimization problem is solved; the decisionvariables are the weights to be assigned to the different columns of the data Hankel matrix rather than the system closed-loopresponses.A thorough analysis of the advantages and disadvantages inherent to direct or indirect behavioral approaches is a topic ofongoing research in the ﬁeld. Here, we note a few initial points. First, the proposed indirect BIOP can directly encapsulaterecent results on statistically optimal non-parametric estimation of an impulse response matrix [20], [21], [23]. Second, (15) involves a number of decision variables that only scales with N , m and p , while in the cost of a direct method the decisionvariables involved in the control cost would also scale with T . Last, we notice that a direct BIOP formulation can most likelybe obtained by adapting, for instance, the results of Section VI in [18]; we leave this topic for future work.

Remark 4.

While other parametrizations equivalent to the IOP exist, including the System Level Parametrization (SLP) [31],and other mixed parametrizations (see [32] for a survey), the IOP may be particularly well-suited for an output-feedbackdata-driven setup. Indeed, the SLP and the mixed parametrizations in [32] all explicitly involve state-space parameters inthe constraints. By solely using input-output trajectories, the state-space parameters can only be recovered up to an unknownchange of variables [35], which may be problematic for deﬁning an initial state and noise variances in the LQG cost. Instead,the BIOP is uniquely deﬁned from data, as it only depends on the impulse response matrix without resorting to an internalstate representation.

IV. R

OBUST

BIOP

WITH N OISE -C ORRUPTED D ATA

The linear system (14) is highly underdetermined when the historical trajectory is very long and noiseless. In particular,any solution ( G, g ) to (14) gives an exact impulse response matrix and free trajectory of the system. In practice, however,the historical and recent data are corrupted by noise. According to the system equations (1)-(2), we can assume historical andrecent trajectories are affected by noise w h ( t ) , w r ( t ) , v h ( t ) , v r ( t ) at all time instants, with expected values µ hw , µ rw , µ hv , µ rv andvariances Σ hw , Σ rw , Σ hv , Σ rv respectively. Hence, the matrix on the left-hand-side of (14) becomes full row-rank almost surelyand (14) can only yield an approximated impulse response matrix and free response. This issue is well-known in the behavioraltheory literature, and several promising solutions have recently been proposed [15], [16], [18], [21], [23]. We brieﬂy reviewsome of them.Letting ˆ U p , ˆ Y p , ˆ U f , ˆ Y f denote the matrices built upon noisy historical data, a simple idea is to choose G and g as G = G LS =  ˆ U p ˆ Y p ˆ U f  +  mT ini × m pT ini × m col ( I m , m ( N − × m )  , g = g LS =  ˆ U p ˆ Y p ˆ U f  +  u r [0 ,T ini − y r [0 ,T ini − mN ×  . (19)The least-square solutions G LS and g LS satisfy (14) with noisy data and are such that (cid:107) G (cid:107) and (cid:107) g (cid:107) are minimized. Whilebeing simple to compute, the least-squares predictor comes without strong statistical guarantees and, for the case of the impulseresponse matrix, it is biased in general due to the ﬁnite-impulse-response truncation error; we refer the interest reader to [26],[36]. A data-based Kalman ﬁlter based solution to reduce the effect of noise is proposed in [23]. Another approach is tominimize a scalar functional f ( · ) that penalizes the residuals Ξ y = ( Y p − ˆ Y p ) G and ξ y = ( Y p − ˆ Y p ) g [16]. A choice thatreﬂects the maximum-likelihood interpretation of total least squares is proposed in [21] and consists in solving the optimizationproblems G ML = arg min G − log (cid:20) p (cid:18)(cid:20) Ξ y Y f G (cid:21) | G, Y f (cid:19)(cid:21) g ML = arg min g − log (cid:20) p (cid:18)(cid:20) ξ y Y f g (cid:21) | g, Y f (cid:19)(cid:21) subject to (cid:20) ˆ U p ˆ U f (cid:21) G = (cid:20) mT ini × m col ( I m , m ( N − × m (cid:21) , subject to (cid:20) ˆ U p ˆ U f (cid:21) g = (cid:20) u r [0 ,T ini − mN × (cid:21) . While the above problems are nonconvex, an iterative procedure to obtain an approximate solution is proposed in [21]. Afurther reﬁnement of the technique applied to impulse response identiﬁcation is established in [20] through optimal inputdesign.Based on the above discussion, denote the estimated impulse and free responses as (cid:98) G and (cid:98) y free respectively. Independentof the chosen estimator for ( G, g ) in the presence of noise, we will have that E [ (cid:98) G ] = M G , Var ( vec ( (cid:98) G )) = Σ G , E [ (cid:98) y free ] = µ y , Var ( (cid:98) y free ) = Σ y , where M G = G and µ y = y free if and only if the estimators are unbiased, and where Σ G , Σ y are “small” in an appropriatesense. We thus work under the assumption that, with high-probability, the errors (cid:13)(cid:13)(cid:13) G − (cid:98) G (cid:13)(cid:13)(cid:13) and (cid:107) y free − (cid:98) y free (cid:107) are small; thebetter the predictor (i.e., smaller bias and variance), the smaller the errors. Motivated as above, we abstract from the particularidentiﬁcation scheme and formalize the following assumption. Assumption 5.

There exist (cid:15) G > and (cid:15) > such that, for any sequence of noisy historical and recent data, with highprobability (cid:13)(cid:13)(cid:13) G − (cid:98) G (cid:13)(cid:13)(cid:13) = (cid:107) ∆ (cid:107) ≤ (cid:15) G , (cid:107) y free − ˆ y free (cid:107) = (cid:107) δ (cid:107) ≤ (cid:15) . We denote (cid:15) = max( (cid:15) G , (cid:15) ) . After condensing the effect of noise into a single error parameter (cid:15) > , we are ready to leverage and adapt the analysistechnique recently suggested in [6] for inﬁnite-horizon LQG, which follows the philosophy ﬁrst introduced in [3] for LQR.As we will show, this allows to quantify the performance degradation due to noise-corrupted data in behavioral models withrespect to LQG. The ﬁrst step is to construct a robust version of (15) that is deﬁned in terms of the available noisy historicaldata. The proof of Proposition 3 is reported in the Appendix. For simplicity, but without loss of generality, we assume that L , R , Σ w , Σ v are identity matrices with appropriate dimensions. Proposition 3.

Assume that historical and recent data are affected by noise. Let (cid:98) G , (cid:98) y free be estimators of G , y free , respectively,such that Assumption 5 holds with (cid:15) > . Consider the following model-based worst-case robust optimal control problem: min K max (cid:107) ∆ (cid:107) ≤ (cid:15), (cid:107) δ (cid:107) ≤ (cid:15) J ( G , K ) = (cid:114) E w , v (cid:104) y T [0 ,N − y [0 ,N − + u T [0 ,N − u [0 ,N − (cid:105) (20) subject to x [0 ,N − = P A (: , x (0) + P B u [0 ,N − , y [0 ,N − = Cx [0 ,N − + v [0 ,N − , u [0 ,N − = Ky [0 ,N − + w [0 ,N − . Then, problem (20) is equivalent to min (cid:98) Φ max (cid:107) ∆ (cid:107) ≤ (cid:15), (cid:107) δ (cid:107) ≤ (cid:15) J ( G , K ) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:34) (cid:98) Φ yy ( I − ∆ (cid:98) Φ uy ) − (cid:98) Φ yy ( I − ∆ (cid:98) Φ uy ) − ( (cid:98) G + ∆ ) (cid:98) Φ uy ( I − ∆ (cid:98) Φ uy ) − ( I − (cid:98) Φ uy ∆ ) − (cid:98) Φ uu (cid:35) (cid:20) I (cid:98) y free + δ I (cid:21)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F subject to (cid:104) I − (cid:98) G (cid:105) (cid:34) (cid:98) Φ yy (cid:98) Φ yu (cid:98) Φ uy (cid:98) Φ uu (cid:35) = (cid:2) I (cid:3) , (cid:34) (cid:98) Φ yy (cid:98) Φ yu (cid:98) Φ uy (cid:98) Φ uu (cid:35) (cid:20) − (cid:98) G I (cid:21) = (cid:20) I (cid:21) , (cid:98) Φ yy , (cid:98) Φ yu , (cid:98) Φ uy , (cid:98) Φ uu have causal sparsities. (21)The robust problem in Proposition 3 is highly non-convex. We therefore proceed with deriving a quasi-convex upperboundto J ( G , K ) to be used for controller synthesis and suboptimality analysis. A. A tractable robust BIOP formulation

The following lemma serves as the basis to derive a tractable formulation of (21). Its rather lengthy technical proof isreported in the Appendix.

Lemma 2.

Let (cid:15) = max( (cid:15) G , (cid:15) ) and assume (cid:15) (cid:13)(cid:13)(cid:13) (cid:98) Φ uy (cid:13)(cid:13)(cid:13) < . Further assume that (cid:13)(cid:13)(cid:13) (cid:98) Φ uy (cid:13)(cid:13)(cid:13) ≤ α for α > . Then, we have J ( G , K ) ≤ − (cid:15) (cid:13)(cid:13)(cid:13) (cid:98) Φ uy (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:34)(cid:113) h ( (cid:15), α, (cid:98) G ) + h ( (cid:15), α, (cid:98) y free ) (cid:98) Φ yy (cid:98) Φ yu (cid:98) Φ yy (cid:98) y free (cid:112) h ( (cid:15), α, (cid:98) y free ) (cid:98) Φ uy (cid:98) Φ uu (cid:98) Φ uy (cid:98) y free (cid:35)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F (22) where h ( (cid:15), α, Y ) = (cid:15) (2 + α (cid:107) Y (cid:107) ) + 2 (cid:15) (cid:107) Y (cid:107) (2 + α (cid:107) Y (cid:107) ) . Exploiting the reformulation idea ﬁrst introduced in [37] and utilized for analysis in [6], we are now ready to establish aquasi-convex reformulation of problem (21).

Theorem 2.

Given estimation errors (cid:15) G , (cid:15) with (cid:15) = max( (cid:15) G , (cid:15) ) , and for any α > , the minimal cost of problem (20) isupper bounded by the minimal cost of the following quasi-convex program: min γ ∈ [0 ,(cid:15) − ) − (cid:15)γ min (cid:98) Φ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:34)(cid:113) h ( (cid:15), α, (cid:98) G ) + h ( (cid:15), α, (cid:98) y free ) (cid:98) Φ yy (cid:98) Φ yu (cid:98) Φ yy (cid:98) y free (cid:112) h ( (cid:15), α, (cid:98) y free ) (cid:98) Φ uy (cid:98) Φ uu (cid:98) Φ uy (cid:98) y free (cid:35)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F (23) subject to (cid:104) I − (cid:98) G (cid:105) (cid:34) (cid:98) Φ yy (cid:98) Φ yu (cid:98) Φ uy (cid:98) Φ uu (cid:35) = (cid:2) I (cid:3) , (cid:34) (cid:98) Φ yy (cid:98) Φ yu (cid:98) Φ uy (cid:98) Φ uu (cid:35) (cid:20) − (cid:98) G I (cid:21) = (cid:20) I (cid:21) , (cid:98) Φ yy , (cid:98) Φ yu , (cid:98) Φ uy , (cid:98) Φ uu have causal sparsities, (cid:13)(cid:13)(cid:13) (cid:98) Φ uy (cid:13)(cid:13)(cid:13) ≤ min( γ, α ) . Proof.

See Theorem 3.2 in [6].First, notice that the inner minimization problem in (23) is strongly convex for a ﬁxed γ , and that the outer function (1 − (cid:15)γ ) − is monotonically increasing in γ . Hence, it is well-known that the overall program can be efﬁciently solved by golden searchon γ and solving the corresponding instances of the inner program. Second, we explicitly take into account the effect of anunknown and noisy initial state x (0) ∈ R n through the parameter (cid:98) y free . Assuming x (0) = 0 as per [24] may not be realisticfor practical purposes, as the user initially lets the system free to evolve in order to harvest data. Furthermore, the followinganalysis will show that, for ﬁnite-horizon control problems, the suboptimality strongly depends on x (0) ∈ R n as a function of (cid:107) y free (cid:107) . Last, we note that the constraint on (cid:107) (cid:98) Φ uy (cid:107) is the main source of suboptimality with respect to the true LQGproblem (12); as pointed out in [3], [6], [24], this additional constraint enforces stronger disturbance rejection properties, forwhich we have to pay in terms of performance. We are now ready to quantify the suboptimality of (23) with respect to (12).V. S UBOPTIMALITY A NALYSIS

In this section, we denote as K (cid:63) , Φ (cid:63) the optimal controller and corresponding closed-loop responses for the real LQGproblem (12). Furthermore, we denote as (cid:98) K (cid:63) , (cid:98) Φ (cid:63) the optimal controller and corresponding closed-loop responses for thequasi-convex program (23) and let J (cid:63) = J ( G , K (cid:63) ) and ˆ J = J ( G , (cid:98) K (cid:63) ) .In this section, inspired by the analysis in [6], we show that if (cid:15) is small enough it holds ˆ J − J (cid:63) J (cid:63) = O ( (cid:15) ) . In other words, for a small estimation error (cid:15) on the impulse response, applying controller (cid:98) K (cid:63) (which is solely computed withnoisy data) to the real plant achieves almost optimal closed-loop performance.We start with a lemma that analytically characterizes a feasible solution to problem (23). We then proceed with characterizingthe suboptimality bound. The proofs of Lemma 3 and Theorem 3 are reported in the Appendix. Lemma 3 (Feasible solution) . Let η = (cid:15) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) , and select α ≥ √ η(cid:15) (1 − η ) . Then, if η < , the following (cid:101) Φ yy = Φ (cid:63)yy ( I + ∆Φ (cid:63)uy ) − , (cid:101) Φ yu = Φ (cid:63)yy ( I + ∆Φ (cid:63)uy ) − ( G − ∆ ) , (cid:101) Φ uy = Φ (cid:63)uy ( I + ∆Φ (cid:63)uy ) − , (cid:101) Φ uu = ( I + Φ (cid:63)uy ∆ ) − Φ (cid:63)uu , (cid:101) γ = √ η(cid:15) (1 − η ) , (24) is a feasible solution to problem (23) . Theorem 3.

Suppose that (cid:15) < (cid:107) Φ (cid:63)uy (cid:107) and that √ (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) ≤ α ≤ (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) . Then, when applying the optimal solution (cid:98) K (cid:63) of (23) to the true plant G , the relative error with respect to the true optimal cost is upper bounded as ˆ J − J (cid:63) J (cid:63) ≤ (cid:15) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) + 4( M + V )= O (cid:16) (cid:15) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) ( (cid:107) G (cid:107) + (cid:107) y free (cid:107) ) (cid:17) , where M = h ( (cid:15), α, (cid:98) G ) + h ( (cid:15), α, (cid:98) y free ) + h ( (cid:15), (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) , G ) + h ( (cid:15), (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) , y free ) ,V = h ( (cid:15), α, (cid:98) y free ) + h ( (cid:15), (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) , y free ) , and h ( a, b, Y ) = a (2 + b (cid:107) Y (cid:107) ) + 2 a (cid:107) Y (cid:107) (2 + b (cid:107) Y (cid:107) ) . Theorem 3 shows that the relative performance of the robust BIOP formulation (23) with respect to its exact non-noisyversion (15) decreases linearly with (cid:15) , as long as (cid:15) is small enough to guarantee (cid:15) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) < . The bound also growsquadratically with the norm of the true impulse and free responses, which implies that an unstable system will be difﬁcultto control for a long horizon. Note that it is appropriate to choose α not too large, and speciﬁcally α ≤ (cid:107) Φ uy (cid:107) < (cid:15) − inorder for the scaling of h ( (cid:15), α, (cid:98) G ) in terms of (cid:15) not to dominate over h ( (cid:15), (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) , G ) . Our rate in terms of (cid:15) matches that of[3], [6], which are valid in inﬁnite-horizon. In spite of the additional challenges of considering a noisy unknown initial state x (0) ∈ R n and noisy output-feedback, our rate also matches the one achieved with the approach of [24] valid for x (0) = 0 and noisy state-feedback. Remark 5 (Sample complexity) . In related work, e.g. [3], [6], [24], the authors more precisely quantify (cid:15) and the probabilityof the estimate to be within the corresponding norm error interval as a function of the noise statistics and the real systemparameters, leading to an end-to-end sample complexity analysis. This is achieved by focusing on a speciﬁc estimation technique(i.e. least squares in [3], [6] and column averaging in [24]) and the corresponding non-asymptotic norm error bounds [26],[38]. We expect that analogous results can be derived for the least-square choice ( G, g ) = ( G LS , g LS ) . However, in this workwe wished to focus on the potential generality of the proposed BIOP, i.e., the fact that the approximation of the impulse and freeresponses is not bound to a speciﬁc estimation technique. Hence, here we have limited ourselves to deriving a suboptimalitybound as a function of (cid:15) , and will not further characterize (cid:15) and the success probability, as both are dependent on the chosenestimation technique. VI. N

UMERICAL E XPERIMENTS

In this section we present our numerical results. For solving optimization problems we used MOSEK [39], called throughMATLAB via YALMIP [40] on a standard laptop computer. Our goals are 1) to verify the noiseless BIOP formulationin Theorem 1 and 2) to validate the suboptimality analysis of Theorem 3 in the presence of noise-corrupted data. In theexperiments, we considered the LTI system characterized by the matrices A = ρ  . . . . − . − . .  , B =  .

22 0 . . .  , C = (cid:20) . . . (cid:21) and D = 0 . It can be veriﬁed that the value ρ > corresponds to the spectral radius of A . The cost function is given by (3), where N = 11 and the cost weights are chosen as L ( t ) = I p and R ( t ) = I m for every t = 0 , . . . , . The average in (3) is taken over futureinput/output noise with variances Σ w = I m and Σ v = I m . Assuming an initial state x (0) = (cid:2) − . (cid:3) T and ρ = 0 . ,the optimal controller K (cid:63) can be found by solving the model-based optimization problem (12), and the corresponding optimalcost is J (cid:63) = 17 . .Hereafter, we assume that the system parameters A , B , C and x (0) are completely unknown . Instead, the followingdata are available: 1) a historical system trajectory { y h [0 ,T − , u h [0 ,T − } , with y h [0 ,T − = y [ − T h , − T h + T − and u h [0 ,T − = u [ − T h , − T h + T − where T = 200 and T h = 249 , and 2) a recent system trajectory { y r [0 ,T ini − , u r [0 ,T ini − } , with y r [0 ,T ini − = y [ − T ini , − , u r [0 ,T ini − = u [ − T ini , − and T ini = 30 . When the collected data are noiseless , one can compute a solution ( G, g ) to (14), for instance by using (19), and solve the optimization problem (15) to ﬁnd the optimal closed-loop responses. Inthis case, the solution ( Φ (cid:63)yy , Φ (cid:63)yu , Φ (cid:63)uy , Φ (cid:63)uu ) yields the optimal closed-loop control policy K (cid:63) = Φ (cid:63)uy ( Φ (cid:63)yy ) − and the sameoptimal cost J (cid:63) = 17 . obtained before, as predicted by Theorem 1.We now focus on the case where the historical and recent data are affected by noise with zero mean and variances Σ hw = Σ rw = σI m , Σ hv = Σ rv = σI p . We analyze performance degradation for increasing values of σ . First, we note that solving(15) with noisy data yields unsatisfactory results; indeed, the problem is often infeasible due to an incoherent estimation of G . Next, we consider the robust formulation of Theorem 2.In order to quantify the estimation error level (cid:15) to be used in (21), one can employ a standard bootstrapping methodology [41].In essence, this approach consists in running experiments to obtain many different new estimates ( ˜ G i , ˜ y ifree ) using trajectoriesgenerated by an initial estimated system (cid:98) G , and selecting (cid:15) to be an appropriately high percentile - say the -th percentile- of the maximum between (cid:13)(cid:13)(cid:13) (cid:101) G i − (cid:98) G (cid:13)(cid:13)(cid:13) and (cid:13)(cid:13)(cid:13)(cid:101) y ifree − (cid:98) y free (cid:13)(cid:13)(cid:13) . The corresponding (cid:15) is then such that (cid:13)(cid:13)(cid:13) (cid:98) G i − G (cid:13)(cid:13)(cid:13) ≤ (cid:15) witha probability of approximately . . The hyper-parameter α can be tuned manually until satisfactory results are obtained and α < (cid:15) − is veriﬁed. Suboptimality -3 Noise to estimation error

Fig. 2: Bootstrapped estimation error as a function of the noise level (on the left). Corresponding suboptimality gap forincreasing values of the spectral radius ρ of matrix A (on the right). In Figure 2, we report the suboptimality gap one incurs by applying the controller (cid:98) K (cid:63) that solves the robust BIOP (21).Speciﬁcally, for each choice of the spectral radius ρ = 0 . , . , . . . , . , . , we consider increasing levels of the variance σ of the noise that corrupts the historical and recent data. We ﬁrst plot the corresponding estimation errors (cid:15) obtained throughbootstrapping in the left part of Figure 2. While observing that the bootstrapped (cid:15) grows almost linearly with σ for any ﬁxed ρ , we highlight that a formal analysis of this relationship is beyond the scope of this paper. We then plot the suboptimalitygap ˆ J − J (cid:63) J (cid:63) as a function of (cid:15) in the right part of Figure 2. It can be observed that, as predicted by Theorem 3, 1) the gapconverges to when (cid:15) converges to , and 2) for similar values of (cid:15) , the gap grows faster than linearly with the spectral radius ρ . We ﬁnally observe that, in theory, the BIOP and robust BIOP formulations in ﬁnite-horizon are valid for unstable systemswith ρ > . However, in practice, it is inherently challenging to collect trajectories of an unstable system, as the values to beplugged into the corresponding numerical programs will become too large to be handled by numerical solvers. For unstablesystems in a data-driven scenario, it is common to assume knowledge of a pre-stabilizing controller [6], [7].VII. C ONCLUSIONS

We have proposed the BIOP, a method for the design of optimal output-feedback controllers which directly embeds historicalinput-output trajectories in its formulation. When these historical data are noiseless, the BIOP is equivalent to the standard IOPand recovers an optimal LQG controller. In the presence of noise-corrupted data, we propose a robust version of the BIOPthat explicitly incorporates the estimated uncertainty level and that can be solved efﬁciently through convex programming.By exploiting recently developed analysis techniques, the suboptimality of the obtained solution is quantiﬁed and comparedwith the nominal LQG solution. Furthermore, the developed framework is readily compatible with state-of-the-art behavioralestimation and prediction techniques, e.g. [16], [20], [23].Our results are intended as a ﬁrst step towards quantifying the effect of noise-corrupted data when using behavioral modelsfor complex data-driven prediction and control tasks. Envisioned future work includes developing a direct counterpart of theproposed BIOP, deriving the sample-complexity for new state-of-the-art estimation methods, adding safety constraints on inputsand outputs, extending to receding horizon scenarios and addressing distributed control tasks.A

PPENDIX

A. Proof of Proposition 1

For the ﬁrst statement, notice that the controller K achieves the closed-loop responses (7). Now select ( Φ yy , Φ yu , Φ uy , Φ uu ) as (cid:20) Φ yy Φ yu Φ uy Φ uu (cid:21) = (cid:20) ( I − GK ) − ( I − GK ) − GK ( I − GK ) − ( I − KG ) − (cid:21) . (25)Clearly, K = Φ uy Φ − yy , and by plugging the corresponding expressions in (9)-(11), we verify that (9)-(11) are satisﬁed.For the second statement, it is easy to notice K is causal by construction because Φ uy and Φ yy are block lower-triangular.Consider now the equation Φ yy = ( I − GK ) − corresponding to the upper-left block of (25). By selecting the controller K = Φ uy Φ − yy one has ( I − GΦ uy Φ − yy ) − = ( I − GΦ uy ( I + GΦ uy ) − ) − = (( I + GΦ uy − GΦ uy )( I + GΦ uy ) − ) − = I + GΦ uy = Φ yy , which shows that Φ yy is the closed-loop response from v [0 ,N − + CP A (: , x (0) to y [0 ,N − as per (7). Similar computationsfor the remaining closed-loop responses conclude the proof. B. Proof of Proposition 2

Let δ y = v [0 ,N − + CP A (: , x (0) and δ u = w [0 ,N − . From linearity of the expectation operator it follows that J ( Φ yy , Φ yu , Φ uy , Φ uu ) = E δ y , δ u [ y T Ly + u T Ru ]= E [ L δ T y Φ T yy Φ yy δ y L ] + E [ L δ T u Φ T yu Φ yu δ u L ] + E [ R δ T y Φ T uy Φ uy δ y R ] + E [ R δ T u Φ T uu Φ uu δ u R ]= L E [ δ T y Φ T yy Φ yy δ y ] L + L E [ δ T u Φ T yu Φ yu δ u ] L + R E [ δ T y Φ T uy Φ uy δ y ] R + R E [ δ T u Φ T uu Φ uu δ u ] R . (26)Focusing, for example, on the ﬁrst addend we have L E [ δ T y Φ T yy Φ yy δ y ] L = L [Tr( Φ T yy Φ yy Σ v ) + ( CP A (: , x (0)) T Φ T yy Φ yy CP A (: , x (0)] L = L (cid:13)(cid:13)(cid:13) Φ yy Σ v (cid:13)(cid:13)(cid:13) F L + L (cid:107) Φ yy CP A (: , x (0) (cid:107) L = L (cid:13)(cid:13)(cid:13) Φ yy Σ v (cid:13)(cid:13)(cid:13) F L + L (cid:107) Φ yy CP A (: , x (0) (cid:107) F L , where the ﬁrst equality follows from E x ( x T M x ) = Tr( M Σ x ) + µ T x M µ x , where Σ x and µ x are the variance and average valueof the random variable x respectively, while the third equality uses the fact that for vectors x ∈ R n we have (cid:107) x (cid:107) = (cid:107) x (cid:107) F .Similar computations hold for the remaining terms of (26). In total, since δ u has zero mean, the cost is made up of six addends.Since they are all convex functions of ( Φ yy , Φ yu , Φ uy , Φ uu ) , and R (cid:107) Φ uy (cid:107) F Σ v is strongly convex, then J ( · ) is stronglyconvex and admits a unique global optimum. By using the property that (cid:107) M (cid:107) F + (cid:107) N (cid:107) F = (cid:13)(cid:13)(cid:2) M N (cid:3)(cid:13)(cid:13) F = (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) MN (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) F , we can rewrite the six addends of the cost compactly as the squared Frobenius norm of the × block-matrix in (12). C. Proof of Proposition 3

First, we verify by direct inspection that for any K , the parameters (cid:98) Φ = (cid:34) ( I − (cid:98) GK ) − ( I − (cid:98) GK ) − (cid:98) GK ( I − (cid:98) GK ) − ( I − KG ) − (cid:35) . satisfy the constraints of (21) and are such that K = (cid:98) Φ uy (cid:98) Φ − yy . Therefore, every controller K is parametrized in problem (21),irrespective of (cid:98) G .We know that for any K , the cost J ( G , K ) is equivalent to (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) ( I − GK ) − ( I − GK ) − GK ( I − GK ) − ( I − KG ) − (cid:21) (cid:20) I CP A (: , x (0)0 I (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) F , (27)Now, we notice that G = (cid:98) G + ∆ , y free = (cid:98) y free + δ and substitute into (27). We obtain: Φ yy = ( I − GK ) − = (cid:16) I − ( (cid:98) G + ∆ ) (cid:98) Φ uy (cid:98) Φ − yy (cid:17) − = (cid:16) I − (cid:98) G (cid:98) Φ uy (cid:98) Φ − yy − ∆ (cid:98) Φ uy (cid:98) Φ − yy (cid:17) − =  ( (cid:98) Φ yy − (cid:98) G (cid:98) Φ uy (cid:124) (cid:123)(cid:122) (cid:125) I − ∆ (cid:98) Φ uy ) (cid:98) Φ − yy  − = (cid:98) Φ yy (cid:16) I − ∆ (cid:98) Φ uy (cid:17) − , Φ yu = ( I − GK ) − G = Φ yy G = (cid:98) Φ yy (cid:16) I − ∆ (cid:98) Φ uy (cid:17) − ( (cid:98) G + ∆ ) , Φ uy = K ( I − GK ) − = KΦ yy = (cid:98) Φ uy (cid:98) Φ − yy (cid:98) Φ yy (cid:16) I − ∆ (cid:98) Φ uy (cid:17) − = (cid:98) Φ uy (cid:16) I − ∆ (cid:98) Φ uy (cid:17) − , Φ uu = ( I − KG ) − = K ( I − GK ) − G + I = Φ uy G + I = (cid:98) Φ uy (cid:16) I − ∆ (cid:98) Φ uy (cid:17) − ( (cid:98) G + ∆ ) + I = (cid:16) I − (cid:98) Φ uy ∆ (cid:17) − (cid:98) Φ uy ( (cid:98) G + ∆ ) + I = (cid:16) I − (cid:98) Φ uy ∆ (cid:17) − ( (cid:98) Φ uy (cid:98) G + (cid:98) Φ uy ∆ + I − (cid:98) Φ uy ∆ )= (cid:16) I − (cid:98) Φ uy ∆ (cid:17) − ( (cid:98) Φ uy (cid:98) G + I ) = (cid:16) I − (cid:98) Φ uy ∆ (cid:17) − (cid:98) Φ uu . This concludes the proof.

D. Proof of Lemma 2

The objective function in Proposition 3 can be written as J ( G , K ) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:34) (cid:98) Φ yy ( I − ∆ (cid:98) Φ uy ) − (cid:98) Φ yy ( I − ∆ (cid:98) Φ uy ) − ( (cid:98) G + ∆ ) (cid:98) Φ yy ( I − ∆ (cid:98) Φ uy ) − ( (cid:98) y free + δ ) (cid:98) Φ uy ( I − ∆ (cid:98) Φ uy ) − ( I − (cid:98) Φ uy ∆ ) − (cid:98) Φ uu (cid:98) Φ uy ( I − ∆ (cid:98) Φ uy ) − ( (cid:98) y free + δ ) (cid:35)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F , or, equivalently, as the square-root of the sum of the square of the Frobenius norms of each of its six blocks. For the upper-leftblock, we have (cid:107) (cid:98) Φ yy ( I − ∆ (cid:98) Φ uy ) − (cid:107) F ≤ (cid:107) (cid:98) Φ yy (cid:107) F (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ (cid:88) k =0 ( ∆ (cid:98) Φ uy ) k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:107) (cid:98) Φ yy (cid:107) F ∞ (cid:88) k =0 (cid:13)(cid:13)(cid:13) ( (cid:15) G (cid:98) Φ uy ) (cid:13)(cid:13)(cid:13) k = (cid:107) (cid:98) Φ yy (cid:107) F − (cid:15) G (cid:107) (cid:98) Φ uy (cid:107) ≤ (cid:107) (cid:98) Φ yy (cid:107) F − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) , where the convergence of the Neumann series follows from ∆ and (cid:98) Φ uy having zero-entries diagonal blocks by construction.Similarly (cid:107) (cid:98) Φ uy ( I − ∆ (cid:98) Φ uy ) − (cid:107) F ≤ (cid:107) (cid:98) Φ uy (cid:107) F − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) , (cid:107) ( I − (cid:98) Φ uy ∆ ) − (cid:98) Φ uu (cid:107) F ≤ (cid:107) (cid:98) Φ uu (cid:107) F − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) . Next, we have (cid:107) (cid:98) Φ yy ( I − ∆ (cid:98) Φ uy ) − ( (cid:98) G + ∆ ) (cid:107) F ≤ (cid:107) (cid:98) Φ yy (cid:98) G (cid:107) F + (cid:107) (cid:98) Φ yy ∆ (cid:107) F + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:98) Φ yy (cid:32) ∞ (cid:88) k =1 ( ∆ (cid:98) Φ uy ) k (cid:33) ( (cid:98) G + ∆ ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F ≤ (cid:107) (cid:98) Φ yu (cid:107) F + (cid:15) (cid:107) (cid:98) Φ yy (cid:107) F + (cid:107) (cid:98) Φ yy (cid:107) F (cid:32) ∞ (cid:88) k =1 (cid:15) k (cid:107) (cid:98) Φ uy (cid:107) k (cid:33) ( (cid:107) (cid:98) G (cid:107) + (cid:15) )= (cid:107) (cid:98) Φ yu (cid:107) F + (cid:15) (cid:107) (cid:98) Φ yy (cid:107) F + (cid:107) (cid:98) Φ yy (cid:107) F (cid:15) (cid:107) (cid:98) Φ uy (cid:107) ( (cid:107) (cid:98) G (cid:107) + (cid:15) )1 − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) ≤ (cid:107) (cid:98) Φ yu (cid:107) F + (cid:15) (cid:107) (cid:98) Φ yy (cid:107) F + (cid:15) (cid:107) (cid:98) Φ yy (cid:107) F (cid:107) (cid:98) Φ uy (cid:107) ( (cid:107) (cid:98) G (cid:107) + (cid:15) )1 − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) = (cid:107) (cid:98) Φ yu (cid:107) F + (cid:15) (cid:107) (cid:98) Φ yy (cid:107) F + (cid:15) (cid:107) (cid:98) Φ yy (cid:107) F (cid:107) (cid:98) Φ uy (cid:107) (cid:107) (cid:98) G (cid:107) + (cid:15) (cid:107) (cid:98) Φ uy (cid:107) (cid:107) (cid:98) Φ yy (cid:107) F − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) ≤ (cid:107) (cid:98) Φ yu (cid:107) F + (cid:15) (cid:107) (cid:98) Φ yy (cid:107) F (2 + (cid:107) (cid:98) Φ uy (cid:107) (cid:107) (cid:98) G (cid:107) )1 − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) , and (cid:107) (cid:98) Φ yy ( I − ∆ (cid:98) Φ uy ) − ( (cid:98) G + ∆ ) (cid:107) F ≤ − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) ) (cid:18) (cid:107) (cid:98) Φ yu (cid:107) F + 2 (cid:15) (cid:107) (cid:98) Φ yu (cid:107) F (cid:107) (cid:98) Φ yy (cid:107) F (2 + (cid:107) (cid:98) Φ uy (cid:107) (cid:107) (cid:98) G (cid:107) ) + (cid:16) (cid:15) (cid:107) (cid:98) Φ yy (cid:107) F (2 + (cid:107) (cid:98) Φ uy (cid:107) (cid:107) (cid:98) G (cid:107) ) (cid:17) (cid:19) ≤ − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) )  (cid:107) (cid:98) Φ yu (cid:107) F + 2 (cid:15) (cid:107) (cid:98) Φ yy (cid:98) G (cid:107) F (cid:124) (cid:123)(cid:122) (cid:125) ≤(cid:107) (cid:98) Φ yy (cid:107) F (cid:107) (cid:98) G (cid:107) (cid:107) (cid:98) Φ yy (cid:107) F (2 + (cid:107) (cid:98) Φ uy (cid:107) (cid:107) (cid:98) G (cid:107) ) + (cid:16) (cid:15) (cid:107) (cid:98) Φ yy (cid:107) F (2 + (cid:107) (cid:98) Φ uy (cid:107) (cid:107) (cid:98) G (cid:107) ) (cid:17)  ≤ − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) ) (cid:16) (cid:107) (cid:98) Φ yu (cid:107) F (cid:107) + (cid:107) (cid:98) Φ yy (cid:107) F (cid:16) (cid:15) (cid:107) (cid:98) G (cid:107) (2 + α (cid:107) (cid:98) G (cid:107) ) + (cid:15) (2 + α (cid:107) (cid:98) G (cid:107) ) (cid:17)(cid:17) = 1(1 − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) ) (cid:16) (cid:107) (cid:98) Φ yu (cid:107) F (cid:107) + (cid:107) (cid:98) Φ yy (cid:107) F h ( (cid:15), α, (cid:98) G ) (cid:17) . Proceeding analogously, one can also prove that (cid:107) (cid:98) Φ yy ( I − ∆ (cid:98) Φ uy ) − ( (cid:98) y free + δ ) (cid:107) F ≤ (cid:107) (cid:98) Φ yy (cid:98) y free (cid:107) F + (cid:15) (cid:107) (cid:98) Φ yy (cid:107) F (2 + (cid:107) (cid:98) Φ uy (cid:107) (cid:107) (cid:98) y free (cid:107) )1 − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) , (cid:107) (cid:98) Φ uy ( I − ∆ (cid:98) Φ uy ) − ( (cid:98) y free + δ ) (cid:107) F ≤ (cid:107) (cid:98) Φ uy (cid:98) y free (cid:107) F + (cid:15) (cid:107) (cid:98) Φ uy (cid:107) F (2 + (cid:107) (cid:98) Φ uy (cid:107) (cid:107) (cid:98) y free (cid:107) )1 − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) , (cid:107) (cid:98) Φ yy ( I − ∆ (cid:98) Φ uy ) − ( (cid:98) y free + δ ) (cid:107) F ≤ − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) ) (cid:16) (cid:107) (cid:98) Φ yy (cid:98) y free (cid:107) F + (cid:107) (cid:98) Φ yy (cid:107) F h ( (cid:15), α, (cid:98) y free ) (cid:17) , (cid:107) (cid:98) Φ uy ( I − ∆ (cid:98) Φ uy ) − ( (cid:98) y free + δ ) (cid:107) F ≤ − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) ) (cid:16) (cid:107) (cid:98) Φ uy (cid:98) y free (cid:107) F + (cid:107) (cid:98) Φ uy (cid:107) F h ( (cid:15), α, (cid:98) y free ) (cid:17) . Therefore, combining the above inequalities we ﬁnally conclude that J ( G , K ) ≤ − (cid:15) (cid:107) (cid:98) Φ uy (cid:107) (cid:118)(cid:117)(cid:117)(cid:116)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:34) (cid:98) Φ yy (cid:98) Φ yu (cid:98) Φ yy (cid:98) y free (cid:98) Φ uy (cid:98) Φ uu (cid:98) Φ uy (cid:98) y free (cid:35)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F + (cid:107) (cid:98) Φ yy (cid:107) F ( h ( (cid:15), α, (cid:98) G ) + h ( (cid:15), α, (cid:98) y free )) + (cid:107) (cid:98) Φ uy (cid:107) F h ( (cid:15), α, (cid:98) y free ) . E. Proof of Lemma 3

First, it is easy to verify that (cid:101) Φ satisﬁes the afﬁne constraints in (23); indeed, (cid:101) Φ is deﬁned to be the closed-loop responseswhen we apply K (cid:63) to the estimated plant (cid:98) G . Then, since η < , it is easy to verify that (cid:101) γ ≤ (cid:15) − . It remains to show that (cid:13)(cid:13)(cid:13) (cid:101) Φ uy (cid:13)(cid:13)(cid:13) ≤ min( (cid:101) γ, α ) : it holds (cid:13)(cid:13)(cid:13) (cid:101) Φ uy (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13) Φ (cid:63)uy ( I + ∆Φ (cid:63)uy ) − (cid:13)(cid:13) ≤ (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) − (cid:15) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) ≤ √ (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) − (cid:15) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) = √ η(cid:15) (1 − η ) = (cid:101) γ ≤ α . F. Proof of Theorem 3

The key of the proof is to ﬁnd a useful relationship between J ( G , (cid:98) K (cid:63) ) and J ( G , K (cid:63) ) , by exploiting the fact that we know asuboptimal solution to (23) by Lemma 3. Using the assumption η < so that α ≥ √ (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) ≥ √ (cid:107) Φ (cid:63)uy (cid:107) − η = √ η(cid:15) (1 − η ) = (cid:101) γ ,we have J ( G , (cid:98) K (cid:63) ) ≤ − (cid:15)γ (cid:63) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:34)(cid:113) h ( (cid:15), α, (cid:98) G ) + h ( (cid:15), α, (cid:98) y free ) (cid:98) Φ (cid:63)yy (cid:98) Φ (cid:63)yu (cid:98) Φ (cid:63)yy (cid:98) y free (cid:112) h ( (cid:15), α, (cid:98) y free ) (cid:98) Φ (cid:63)uy (cid:98) Φ (cid:63)uu (cid:98) Φ (cid:63)uy (cid:98) y free (cid:35)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F ≤ − (cid:15) (cid:101) γ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:34)(cid:113) h ( (cid:15), α, (cid:98) G ) + h ( (cid:15), α, (cid:98) y free ) (cid:101) Φ yy (cid:101) Φ yu (cid:101) Φ yy (cid:98) y free (cid:112) h ( (cid:15), α, (cid:98) y free ) (cid:101) Φ uy (cid:101) Φ uu (cid:101) Φ uy (cid:98) y free (cid:35)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F , where γ (cid:63) is optimal for (23), and the second inequality holds because ( γ (cid:63) , (cid:98) Φ (cid:63) ) represents the optimal solution to (23) and ( (cid:101) γ, (cid:101) Φ ) is a suboptimal feasible solution of (23) by Lemma 3. Using the deﬁnition of (cid:101) Φ from Lemma 3, we now relate theterm (cid:101) C = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:34)(cid:113) h ( (cid:15), α, (cid:98) G ) + h ( (cid:15), α, (cid:98) y free ) (cid:101) Φ yy (cid:101) Φ yu (cid:101) Φ yy (cid:98) y free (cid:112) h ( (cid:15), α, (cid:98) y free ) (cid:101) Φ uy (cid:101) Φ uu (cid:101) Φ uy (cid:98) y free (cid:35)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F , to the optimal cost of problem (12). By deﬁning M = h ( (cid:15), α, (cid:98) G ) + h ( (cid:15), α, (cid:98) y free ) + h ( (cid:15), (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) , G ) + h ( (cid:15), (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) , y free ) , and V = h ( (cid:15), α, (cid:98) y free ) + h ( (cid:15), (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) , y free ) , we derive (cid:101) C = (cid:118)(cid:117)(cid:117)(cid:116)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:34) (cid:101) Φ yy (cid:101) Φ yu (cid:101) Φ yy (cid:98) y free (cid:101) Φ uy (cid:101) Φ uu (cid:101) Φ uy (cid:98) y free (cid:35)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F + (cid:16) h ( (cid:15), α, (cid:98) G ) + h ( (cid:15), α, (cid:98) y free ) (cid:17) (cid:13)(cid:13)(cid:13) (cid:101) Φ yy (cid:13)(cid:13)(cid:13) F + h ( (cid:15), α, (cid:98) y free ) (cid:13)(cid:13)(cid:13) (cid:101) Φ uy (cid:13)(cid:13)(cid:13) F ≤ − (cid:15) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) (cid:113) J ( G , K (cid:63) ) + M (cid:13)(cid:13) Φ (cid:63)yy (cid:13)(cid:13) F + V (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) F , where the bound (1 − (cid:15) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:34) (cid:101) Φ yy (cid:101) Φ yu (cid:101) Φ yy (cid:98) y free (cid:101) Φ uy (cid:101) Φ uu (cid:101) Φ uy (cid:98) y free (cid:35)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F ≤ J ( G , K (cid:63) ) + ( h ( (cid:15), (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) , G ) + h ( (cid:15), (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) , y free )) (cid:13)(cid:13) Φ (cid:63)yy (cid:13)(cid:13) F ++ h ( (cid:15), (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) , y free ) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) F , is derived in the same way as in Lemma 2, by using the expressions in Lemma 3.Thus, we have established the chain of inequalities J ( G , (cid:98) K (cid:63) ) ≤ − (cid:15) (cid:101) γ (cid:101) C ≤ − (cid:15) (cid:101) γ − (cid:15) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) (cid:113) J ( G , K (cid:63) ) + M (cid:13)(cid:13) Φ (cid:63)yy (cid:13)(cid:13) F + V (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) F . Taking the squares, recalling that η < , and using the fact that if M, V > , then M a + V b ≤ ( M + V )( a + b ) , we derive J ( G , (cid:98) K (cid:63) ) − J ( G , K (cid:63) ) J ( G , K (cid:63) ) ≤ (cid:32) − (cid:15) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) ) (1 − (cid:15) (cid:101) γ ) (cid:33) (cid:32) M (cid:13)(cid:13) Φ (cid:63)yy (cid:13)(cid:13) F + V (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) F J ( G , K (cid:63) ) (cid:33) − ≤ η (cid:32) √ − (1 + √ η (1 − (1 + √ η ) (cid:33) + M (cid:13)(cid:13) Φ (cid:63)yy (cid:13)(cid:13) F + V (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) F J ( G , K (cid:63) ) (1 − (1 + √ η ) ≤ η (cid:32) √ − (1 + √ η (1 − (1 + √ η ) (cid:33) + M + V (1 − (1 + √ η ) ≤ η + 4( M + V ) . Last, we prove that η + 4( M + V ) = O (cid:16) (cid:15) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) ( (cid:107) G (cid:107) + (cid:107) y free (cid:107) ) (cid:17) . By considering the expressions of M and V ,using α ≤ (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) , η < , (cid:13)(cid:13)(cid:13) (cid:98) G (cid:13)(cid:13)(cid:13) ≤ (cid:107) G (cid:107) + (cid:15) and (cid:107) (cid:98) y free (cid:107) ≤ (cid:107) y free (cid:107) + (cid:15) , we deduce that: M = h ( (cid:15), α (cid:98) G ) + h ( (cid:15), α, (cid:98) y free ) + h ( (cid:15), (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) , G ) + h ( (cid:15), (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) , y free ) ≤ (cid:104) (cid:15) (2+5 (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) (cid:107) G (cid:107) ) + 2 (cid:15) (cid:107) G (cid:107) (2+5 (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) (cid:107) G (cid:107) ) + (cid:15) (2 + 5 (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) (cid:107) y free (cid:107) ) ++ 2 (cid:15) (cid:107) y free (cid:107) (2 + 5 (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) (cid:107) y free (cid:107) ) (cid:105) + O ( (cid:15) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) ( (cid:107) G (cid:107) + (cid:107) y free (cid:107) ))= O (cid:16) (cid:15) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) ( (cid:107) G (cid:107) + (cid:107) y free (cid:107) ) (cid:17) , and similarly V = O (cid:16) (cid:15) (cid:13)(cid:13) Φ (cid:63)uy (cid:13)(cid:13) (cid:107) y free (cid:107) (cid:17) . The result follows.R EFERENCES[1] F. Lamnabhi-Lagarrigue, A. Annaswamy, S. Engell, A. Isaksson, P. Khargonekar, R. M. Murray, H. Nijmeijer, T. Samad, D. Tilbury, and P. Van den Hof,“Systems & control for the future of humanity, research agenda: Current and future roles, impact and grand challenges,”

Annual Reviews in Control ,vol. 43, pp. 1–64, 2017.[2] B. Recht, “A tour of reinforcement learning: The view from continuous control,”

Annual Review of Control, Robotics, and Autonomous Systems , vol. 2,pp. 253–279, 2019.[3] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample complexity of the Linear Quadratic Regulator,”

Foundations of ComputationalMathematics , pp. 1–47, 2019.[4] M. Fazel, R. Ge, S. M. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the Linear Quadratic Regulator,” arXiv preprintarXiv:1801.05039 , 2018.[5] D. Malik, A. Pananjady, K. Bhatia, K. Khamaru, P. L. Bartlett, and M. J. Wainwright, “Derivative-free methods for policy optimization: Guarantees forlinear quadratic systems,” arXiv preprint arXiv:1812.08305 , 2018.[6] Y. Zheng and L. Furieri, M. Kamgarpour, and N. Li, “Sample complexity of linear quadratic gaussian (LQG) control for output feedback systems,” arXiv preprint arXiv:2011.09929, [PDF] , 2020.[7] M. Simchowitz, K. Singh, and E. Hazan, “Improper learning for non-stochastic control,” arXiv preprint arXiv:2001.09254 , 2020.[8] S. Lale, K. Azizzadenesheli, B. Hassibi, and A. Anandkumar, “Logarithmic regret bound in partially observable linear dynamical systems,” arXiv preprintarXiv:2003.11227 , 2020.[9] A. Tsiamis, N. Matni, and G. Pappas, “Sample complexity of kalman ﬁltering for unknown systems,” in

Learning for Dynamics and Control . PMLR,2020, pp. 435–444.[10] K. Zhang, B. Hu, and T. Basar, “Policy optimization for H linear control with H ∞ robustness guarantee: Implicit regularization and global convergence,”in Learning for Dynamics and Control . PMLR, 2020, pp. 179–190.[11] S. Dean, S. Tu, N. Matni, and B. Recht, “Safely learning to control the constrained Linear Quadratic Regulator,” in . IEEE, 2019, pp. 5582–5588.[12] S. Fattahi, N. Matni, and S. Sojoudi, “Efﬁcient learning of distributed linear-quadratic control policies,”

SIAM Journal on Control and Optimization ,vol. 58, no. 5, pp. 2927–2951, 2020.[13] L. Furieri, Y. Zheng, and M. Kamgarpour, “Learning the globally optimal distributed LQ regulator,” in

Learning for Dynamics and Control . PMLR,2020, pp. 287–297.[14] J. C. Willems and J. W. Polderman,

Introduction to mathematical systems theory: a behavioral approach . Springer Science & Business Media, 1997,vol. 26.[15] J. Coulson, J. Lygeros, and F. D¨orﬂer, “Data-enabled predictive control: In the shallows of the DeePC,” in . IEEE, 2019, pp. 307–312.[16] ——, “Distributionally robust chance constrained data-enabled predictive control,” arXiv preprint arXiv:2006.01702 , 2020.[17] F. D¨orﬂer, J. Coulson, and I. Markovsky, “Bridging direct & indirect data-driven control formulations via regularizations and relaxations,” arXiv preprintarXiv:2101.01273 , 2021.[18] C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality and robustness,”

IEEE Transactions on Automatic Control , 2019.[19] J. Berberich, J. K¨ohler, M. A. Muller, and F. Allgower, “Data-driven model predictive control with stability and robustness guarantees,”

IEEE Transactionson Automatic Control , 2020.[20] A. Iannelli, M. Yin, and R. S. Smith, “Experiment design for impulse response identiﬁcation with signal matrix models,” arXiv preprint arXiv:2012.08126 ,2020.[21] M. Yin, A. Iannelli, and R. S. Smith, “Maximum likelihood estimation in data-driven modeling and control,” arXiv preprint arXiv:2011.00925 , 2020.[22] Y. Lian and C. N. Jones, “Nonlinear data-enabled prediction and control,” arXiv preprint arXiv:2101.03187 , 2021.[23] D. Alpago, F. D¨orﬂer, and J. Lygeros, “An extended Kalman ﬁlter for data-enabled predictive control,”

IEEE Control Systems Letters , vol. 4, no. 4, pp.994–999, 2020.[24] A. Xue and N. Matni, “Data-driven system level synthesis,” arXiv preprint arXiv:2011.10674 , 2020. [25] L. Furieri, Y. Zheng, A. Papachristodoulou, and M. Kamgarpour, “An Input-Output Parametrization of stabilizing controllers: amidst Youla and SystemLevel Synthesis,” IEEE Control Systems Letters , vol. 3, no. 4, pp. 1014–1019, 2019.[26] S. Oymak and N. Ozay, “Non-asymptotic identiﬁcation of LTI systems from a single trajectory,” in . IEEE,2019, pp. 5655–5661.[27] D. P. Bertsekas, “Dynamic programming and optimal control 3rd edition, volume II,”

Belmont, MA: Athena Scientiﬁc , 2011.[28] K. Zhou, J. C. Doyle, and K. Glover,

Robust and optimal control . Prentice hall New Jersey, 1996, vol. 40.[29] A. Bemporad, “Reducing conservativeness in predictive control of constrained systems with disturbances,” in

Proceedings of the 37th IEEE Conferenceon Decision and Control (Cat. No. 98CH36171) , vol. 2. IEEE, 1998, pp. 1384–1389.[30] D. Youla, H. Jabr, and J. Bongiorno, “Modern Wiener-Hopf design of optimal controllers–Part II: The multivariable case,”

IEEE Trans. on Aut. Contr. ,vol. 21, no. 3, pp. 319–338, 1976.[31] Y.-S. Wang, N. Matni, and J. C. Doyle, “A system level approach to controller synthesis,”

IEEE Trans. on Aut. Contr. , 2019.[32] Y. Zheng, L. Furieri, M. Kamgarpour, and N. Li, “System-level, input-output and new parameterizations of stabilizing controllers, and their numericalcomputation,” arXiv preprint arXiv:1909.12346 , 2019.[33] J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,”

Systems & Control Letters , vol. 54, no. 4, pp.325–329, 2005.[34] I. Markovsky and P. Rapisarda, “Data-driven simulation and control,”

International Journal of Control , vol. 81, no. 12, pp. 1946–1959, 2008.[35] I. Markovsky, J. C. Willems, S. Van Huffel, and B. De Moor,

Exact and approximate modeling of linear systems: A behavioral approach . SIAM, 2006.[36] S. Sedghizadeh and S. Beheshti, “Data-driven subspace predictive control: Stability and horizon tuning,”

Journal of the Franklin Institute , vol. 355,no. 15, pp. 7509–7547, 2018.[37] N. Matni, Y.-S. Wang, and J. Anderson, “Scalable system level synthesis for virtually localizable systems,” in . IEEE, 2017, pp. 3473–3480.[38] J. A. Tropp, “User-friendly tail bounds for sums of random matrices,”

Foundations of computational mathematics , vol. 12, no. 4, pp. 389–434, 2012.[39] MOSEK Aps, “The MOSEK optimization toolbox for MATLAB manual. Version 8.1.” 2017.[40] J. L¨ofberg, “YALMIP : A Toolbox for Modeling and Optimization in MATLAB,” in

In Proc. of the CACSD Conf. , Taipei, Taiwan, 2004.[41] B. Efron, “Bootstrap methods: another look at the jackknife,” in