Active Learning for Linear Parameter-Varying System Identification
Robert Chin, Alejandro I. Maass, Nalika Ulapane, Chris Manzie, Iman Shames, Dragan Nešić, Jonathan E. Rowe, Hayato Nakada
aa r X i v : . [ ee ss . S Y ] M a y Active Learning for LinearParameter-Varying System Identification ⋆ Robert Chin ∗ Alejandro I. Maass ∗∗ Nalika Ulapane ∗∗ Chris Manzie ∗∗ Iman Shames ∗∗ Dragan Neˇsi´c ∗∗ Jonathan E. Rowe ∗∗∗
Hayato Nakada ∗∗∗∗∗
Department of Electrical & Electronic Engineering, The University ofMelbourne, Australia & School of Computer Science, University ofBirmingham, UK (e-mail: [email protected]). ∗∗ Department of Electrical & Electronic Engineering, The Universityof Melbourne, Australia (e-mails: { alejandro.maass, nalika.ulapane,manziec, iman.shames, dnesic } @unimelb.edu.edu) ∗∗∗ School of Computer Science, University of Birmingham, UK & TheAlan Turing Institute, UK (e-mail: [email protected]) ∗∗∗∗
Advanced Unit Management System Development Division, ToyotaMotor Corporation, Japan (e-mail: hayato [email protected])
Abstract:
Active learning is proposed for selection of the next operating points in the design ofexperiments, for identifying linear parameter-varying systems. We extend existing approachesfound in literature to multiple-input multiple-output systems with a multivariate schedulingparameter. Our approach is based on exploiting the probabilistic features of Gaussian processregression to quantify the overall model uncertainty across locally identified models. This resultsin a flexible framework which accommodates for various techniques to be applied for estimationof local linear models and their corresponding uncertainty. We perform active learning inapplication to the identification of a diesel engine air-path model, and demonstrate that measuresof model uncertainty can be successfully reduced using the proposed framework.
Keywords:
Machine learning, System identification, Parameter estimation, Uncertainty, Dieselengines1. INTRODUCTIONActive learning, along with closely-related optimal exper-imental design, are a subfields of machine learning andstatistics, that are concerned with the determination ofquery points which to sample data (Settles, 2012). Themain rationale underpinning active learning is that datacollection is costly , so these query points should be selectedin a way such that it optimises some notion of accuracyfor a model being identified. Thus, active learning carriesthe advantage of enabling either identification of a modelthat is more accurate for a fixed data collection budget, oridentification to a specified accuracy within a smaller datacollection budget.Optimal experimental design for dynamical systems hasbeen studied since the 1960s. Levin (1960) demonstratedthat a white noise input signal to a single-input single-output (SISO) discrete-time linear system minimised theA-optimality criterion (trace of the covariance matrix) forthe parameters of a finite impulse response model. Good-win (1971) gave an A-optimality formulation for optimaldesign of input signals for a general class of discrete-timenonlinear systems. Due to limited computational resources ⋆ This work was supported by Toyota Motor Corporation, Japan.The first author is also supported by the Elizabeth & Vernon Puzeyscholarship. at the time, the method was exemplified on simpler sys-tems.More recently, the use of linear parameter-varying (LPV)systems (a class of nonlinear systems) have emerged asan approach for model-based control of nonlinear systems,whereby local linear controllers are designed for regionsof an operating space in a gain-scheduled manner (Toth,2010). There are two broad approaches to the identifi-cation of LPV systems. In the local approach, severallocal linear models are identified at several fixed operatingpoints (also called scheduling points), which are then inter-polated over the operating space. In the global approach,an LPV model is identified from an experiment whichexcites the operating space as well (dos Santos et al., 2012).The optimal experimental design for local LPV identifica-tion has previously been investigated, where in Khalateet al. (2009), a technique was proposed to select new oper-ating points to query for SISO systems with a univariateoperating point. Their approach minimised a measure ofanticipated overall accuracy, and assumed that each locallinear model could be identified perfectly. Motchon et al.(2018) relaxes this assumption, and provides an algorithmfor the simultaneous selection of operating points anddesign of input signals (although still only valid for theclass of SISO systems with univariate operating point).heir optimisation criterion is based on an A-optimality-like criterion.The main contribution of our work relates to a frameworkof active learning for LPV system identification via alocal approach, which extends previous work since itis applicable to multiple-input multiple-ouput (MIMO)systems with multivariate operating point. The frameworkalso quantifies the uncertainty associated with the LPVmodel in terms of the variance of the model parameters.
Throughout this paper, the set R refers to the real num-bers, and the superscript ⊤ denotes the matrix transpose.The operator diag {·} means to take a diagonal matrixwith diagonal elements equal to its arguments. The math-ematical expectation operator is designated by E [ · ], andvariance by Var ( · ). A multivariate Gaussian distributionwith mean m and covariance C is denoted by N ( m , C ).2. ACTIVE LEARNING FRAMEWORK We consider noisy discrete-time LPV systems of the fol-lowing form: x k +1 = A ( θ ) x k + B ( θ ) u k + w k , (1) y k = Cx k , (2)with state x k ∈ R n , input u k ∈ R m , output y k ∈ R p and with noise/unmodelled disturbance sequence w k . Theoperating point θ ∈ Θ ⊂ R d parametrises the systemmatrices A ( θ ) and B ( θ ), the latter two which are theobjects of interest to be identified. For the identificationproblem, we make the following assumptions. Assumption 1.
The operating space Θ ⊂ R d is a compactregion. Assumption 2.
The functions A : Θ → R n × n and B : Θ → R n × m are smooth. Assumption 3.
The matrix C is known and we have accessto the full state measurement x k . Assumption 4.
For all θ ∈ Θ, the system (1) is stable andthe noise w k is an independent and identically distributed(i.i.d.) sequence with covariance matrix E ( θ ).In our formulation, Assumption 3 ensures the system order n is known and the state-space realisation is specified, soidentification of (1) for fixed θ becomes a special case ofVARX regression, where identifiability issues arising fromunknown state-space realisation do not become a concern.Also note by Assumption 4 that we do not necessarilyrequire the noise to be Gaussian.Implementation of predictive control algorithms for (1)require knowledge of the system matrices A ( θ ) and B ( θ ).As these are often not known in practice, they wouldbe replaced by their estimates b A ( θ ) and b B ( θ ). Doing sointroduces some uncertainty in the predictions (in the formof variance), attributed to variance in the estimates for A ( θ ) and B ( θ ). This motivates our problem herein, whichis to devise a method that quantifies the uncertainty in theestimates b A ( θ ) and b B ( θ ), and simultaneously leveragesthis to decide the next operating point to conduct anexperiment at. We describe Gaussian process regression (GPR), which hasbeen used in active learning settings (Brochu et al., 2007)and in uncertainty quantification (Bilionis and Zabaras,2012). A Gaussian process on d -variate feature variable θ ∈ R d may be defined by: f ( θ ) ∼ GP ( µ ( θ ) , κ ( θ, θ ′ )) (3)where µ ( θ ) : R d → R is called the mean function andpositive definite kernel κ ( θ, θ ′ ) : R d × R d → R is knownas the covariance function. For two collections of points θ = ( θ , . . . , θ m ) and θ ′ = ( θ ′ , . . . , θ ′ n ), denote K (cid:0) θ , θ ′ (cid:1) := κ ( θ , θ ′ ) . . . κ ( θ , θ ′ n )... . . . ... κ ( θ m , θ ′ ) . . . κ ( θ m , θ ′ n ) , (4) µ ( θ ) := [ µ ( θ ) . . . µ ( θ m )] . (5)Then for pre-specified prior mean and covariance functions µ ( · ) and κ ( · , · ), the posterior predictive distribution attest points θ ∗ given input-output training data D = ( θ , f )subject to zero-mean Gaussian noise with covariance Σ onthe output observations f , is given by:[ f ∗ | θ ∗ , D ] ∼ N (cid:0) µ ( θ ∗ ) + K ( θ ∗ , θ ) K − ( f − µ ( θ )) ,K ( θ ∗ , θ ∗ ) − K ( θ ∗ , θ ) K − K ( θ , θ ∗ ) (cid:1) , (6)where K := K ( θ , θ ) + Σ . (7)The primary computational cost incurred by GPR is theinversion of the m × m matrix K , for which there areefficient ways of bypassing, such as by using the Choleskydecomposition (Rasmussen and Williams, 2006, Algorithm2.1). The active learning procedure is explained as follows. Wepresume there to be an initial selection of m operatingpoints θ = ( θ , . . . , θ m ) for identification. For each ofthese points, a time-series data set has been collected byrunning a local experiment and measuring the ( x k , u k )pairs. From this, we have then subsequently identified locallinear models with matrices (cid:16) b A θ , b B θ (cid:17) , . . . , (cid:16) b A θ m , b B θ m (cid:17) .Moreover, suppose our estimation method also providesuncertainty estimates for the identified parameters in theform of estimated standard deviation for the estimator(called the standard errors of the estimates). For anarbitrary element b γ θ i of (cid:16) b A θ i , b B θ i (cid:17) for any i ∈ { , . . . , m } ,denote its standard error by se ( b γ θ i ).Now to conduct active learning, we fit Gaussian processesto each of the elements of A ( θ ) and B ( θ ). That is, werepresent these matrices as A ( θ ) = a ( θ ) . . . a n ( θ )... . . . ... a n ( θ ) . . . a nn ( θ ) , (8) B ( θ ) = b ( θ ) . . . b m ( θ )... . . . ... b n ( θ ) . . . b nm ( θ ) , (9)here each element a ( θ ) , . . . , b nm ( θ ) is a GPR modelover θ as introduced in Section 2.2. From our initialidentified models, we form n + mn training datasets D a , . . . , D b nm from the m experiments. Each D γ for γ ∈ { a , . . . , b nm } consists of m observations with feature-label pairs ( θ i , b γ θ i ) for i = 1 , . . . , m . Then at this stage,GPR is applied to each training dataset. Note that thisinduces a distribution over LPV models, and is the primarymechanism used in this paper to quantify uncertainty,which we do so in the following novel way. Under standardconditions (these being (1) is stable, w k is i.i.d. and u k isquasistationary), the least squares parameter estimates areasymptotically normal as the length of time for the localexperiment tends to infinity (Boutahar and Deniau, 1995).Hence it is reasonable to use those standard errors as theGaussian output-error covariances for each of the GPR:Σ γ = diag n se ( b γ θ ) , . . . , se ( b γ θ m ) o (10)for each γ ∈ { a , . . . , b nm } . In traditional GPR, thecovariance Σ is typically treated as a hyperparameterthat can be optimised (usually simplified to be a scaledidentity matrix). Here, we expressly use Σ to incorpo-rate uncertainty information about the local parametersinto the resulting GPR-LPV model. Qualitatively, wherethere is greater uncertainty about the local parameterestimates, this carries through to greater uncertainty inthat surrounding region on the GPR-LPV model, as willbe illustrated later on in Section 3. As a probabilistic model, the utility of the fitted GPR-LPV is that it can be used to quantify the uncertainty ofthe model with respect to an operating point of interest θ ∗ .Introduce g M ( θ ∗ ) : Θ → R as an arbitrary objective func-tion which quantifies a measure of uncertainty at operatingpoint θ ∗ for identified GPR-LPV model M . Following thewell-known MacKay approach, new query points can beselected where there is currently the most uncertainty(MacKay, 1992). The decision of which operating point toconduct the ( m + 1) th experiment at is obtained by solving θ m + = argmax θ ∗ ∈ Θ g M ( θ ∗ ) . (11)In this paper, we focus on g M ( θ ∗ ) being the sum of GPR-LPV variances: g M ( θ ∗ ) = X γ ∈{ a ,...,b nm } Var ( γ |D γ , θ ∗ ) , (12)which is a natural choice, since it is equivalent to thetrace of the posterior covariance for the parameter vec-tor ( a , . . . , b m ). In general, the problem (11) can havemultiple local optima. If d = 2, global optima may bevalidated visually due to Assumption 1. However beyond d = 2, the problem of finding global optima begins to sufferfrom the curse of dimensionality. This is a similar problemencountered in Bayesian active learning, whereby the prac-tice is to resort to global optimisation and heuristic searchtechniques to find an approximate solution (Brochu et al.,2007).Note that the type of uncertainty we are quantifying is the epistemic uncertainty (i.e. the model uncertainty), becausethe epistemic uncertainty can in principle be reduced bycollecting more data. Quantifying the aleatoric uncertainty (which would involve estimating the covariance of the noise w k ) is not within the main scope of the active learningframework because the aleatoric uncertainty by definitioncannot be reduced (without modifying the system itself). The active learning procedure is detailed by the pseu-docode in Algorithm 1, with the following components. • Time-series datasets D , . . . , D m from local exper-iments conducted at the corresponding operatingpoints θ , . . . , θ m . Note that the experiments need notbe all of the same length. • A method ilm () which identifies a local linear model(with standard errors) from local experiment data. • A method gpr () which fits a GPR-LPV model to thelocal linear models, as described in Section 2.3. • A method uc () which computes the uncertainty cri-terion for a GPR-LPV model at a supplied operatingpoint.Specific implementation details of the methods ilm (), gpr (), uc () are up to the practitioner’s choice, which allowsfor flexible variations of the active learning algorithm. Weare also formally required to impose a basic assumption onthe time-series data, so that identifiability is maintained. Assumption 5.
The input signals in each of D , . . . , D m are quasistationary and satisfy persistency of excitation(Astrm and Eykhoff, 1971). Algorithm 1
Active Learning with GPR-LPV Models for i ∈ { , . . . , m } do Perform ilm ( D i ) to obtain (cid:16) b A θ i , b B θ i (cid:17) andse (cid:16) b A θ i (cid:17) , se (cid:16) b B θ i (cid:17) end for for γ ∈ nb a , . . . , b b nm o do Construct D γ from D , . . . , D m Σ γ ← diag n se ( γ θ ) , . . . , se ( γ θ m ) o end for Perform gpr ( D a , . . . , D b nm , Σ a , . . . , Σ b nm ) to ob-tain GPR-LPV model M Solve (11) using g M ( θ ∗ ) := uc ( M , θ ∗ ) Return θ m + We are able to state the following two results for our activelearning framework, which characterise the performanceof Algorithm 1 in terms of the posterior variance on theGPR-LPV model.
Lemma 1.
Suppose the experiment at θ m +1 is appendedto the existing GPR-LPV which is identified from exper-iments at operating points θ m = ( θ , . . . , θ m ). Then foreach parameter γ ∈ { a , . . . , b nm } , the reduction R γ, m +1 in posterior variance at θ ∗ is given by: R γ, m +1 ( θ ∗ )= (cid:0) κ ( θ ∗ , θ m +1 ) − k ⊤ m , m +1 K − γ K ( θ m , θ ∗ ) (cid:1) κ ( θ m +1 , θ m +1 ) + se (cid:0)b γ θ m +1 (cid:1) − k ⊤ m , m +1 K − γ k m , m +1 , (13)here k m , m +1 := K ( θ m , θ m +1 ) , (14) K γ := K ( θ m , θ m ) + Σ γ , (15)Σ γ := diag n se ( b γ θ ) , . . . , se ( b γ θ m ) o , (16)and b γ θ i is the estimator for parameter γ ( θ i ) via the datacollected at experiment i . Proof.
The proof follows closely to the online supplementof Sung et al. (2018), which relies on partitioned matrixinverse results. The main difference here is our inclusionof the standard errors (i.e. se ( b γ θ ), se ( b γ θ ), etc.) in theoutput observation covariances. Remark 1.
The reduction in posterior variance is non-negative since the denominator of (13) is the Schur com-plement of a positive definite matrix. Additionally, we cansee that a smaller standard error se (cid:0)b γ θ m +1 (cid:1) results in agreater reduction in the posterior variance. When the termse (cid:0)b γ θ m +1 (cid:1) is computed using an asymptotic approximation(L¨utkepohl, 2005, (10.3.8)), it behaves like O (cid:16) T − / m +1 (cid:17) ,where T m +1 is the length of the ( m + 1) th experiment. Thisyields an intuitive conclusion that conducting a longerexperiment will result in a greater reduction in posteriorvariance of the GPR-LPV.Next, we upper bound the posterior variance at the queriedoperating point in terms of the standard errors providedby ilm() . Theorem 2.
Suppose the experiment at θ m +1 is appendedto the existing GPR-LPV which is identified from exper-iments at operating points θ m . Then for each parameter γ ∗ ∈ { a , . . . , b nm } , the posterior variance at θ ∗ = θ m +1 satisfiesVar (cid:0) γ ∗ (cid:12)(cid:12) D γ , θ m +1 , b γ θ m +1 , θ ∗ = θ m +1 (cid:1) ≤ se (cid:0)b γ θ m +1 (cid:1) . (17) Proof.
Begin from (13) and substitute θ m +1 for θ ∗ . Thenfrom the structure for the posterior variance given in (6),we are able to show that the posterior variance takes theform:Var (cid:0) γ ∗ (cid:12)(cid:12) D γ , θ m +1 , b γ θ m +1 , θ ∗ = θ m +1 (cid:1) = a − a a + b , (18)where a := κ ( θ ∗ , θ ∗ ) − K ( θ ∗ , θ ∗ ) K − γ K ( θ ∗ , θ ∗ ) , (19) b := se (cid:0)b γ θ m +1 (cid:1) . (20)Then it follows thatVar (cid:0) γ ∗ (cid:12)(cid:12) D γ , θ m +1 , b γ θ m +1 , θ ∗ = θ m +1 (cid:1) = b · aa + b ≤ b (21)since a ≥ b ≥ Remark 2.
If the uncertainty criterion is chosen as the sumof GPR-LPV variances as in (12), then Theorem 2 impliesthat the total uncertainty at θ m +1 post active learningwill be upper bounded by the trace of the estimatedcovariance matrix for the local LPV model parameters.In this way, the active learning framework decouples thechoice of operating point from the choice of input signalsin the local experiment. Algorithm 1 can be seen as find-ing the operating point with greatest variance reductionpotential, for which the resultant variance reduction canbe controlled by the design of the local experiment with an A-optimality criterion. In general, this local design prob-lem will depend on experimental constraints such as theallowable length of experimental time, as well as slew rate,saturation or power constraints on the input signals. Thissub-problem is already well-addressed for linear systemsin other literature, so we do not elaborate further here.3. ACTIVE LEARNING FOR DIESEL ENGINEAIR-PATHWe apply the active learning framework to the LPV systemidentification of a physical automotive diesel engine air-path, with exhaust gas recirculation (EGR) and variablegeometry turbine (VGT). A typical high-fidelity model forthe diesel air-path has around eight states, for example inWahlstrom and Eriksson (2011). In Shekhar et al. (2017),a reduced order model of four states was introduced tofacilitate the online implementation of model predictivecontrol. Following Shekhar et al. (2017), the system is modelledusing n = 4 measured signals for the states: x = [ p im p em W comp y EGR ] ⊤ (22)and m = 3 actuators: u = [ u thr u EGR u VGT ] ⊤ , (23)where p im is the intake manifold (boost) pressure, p em isthe exhaust manifold pressure, W comp is the compressormass flow rate and y EGR is the EGR rate (which is the ratioof EGR mass flow rate to the sum of EGR and compressormass flow rates). For the inputs, u thr is the throttle valve, u EGR is the EGR valve and u VGT is the VGT vane. Amodel is developed in the trimmed state and input: e x = x − ¯ x ( θ ) , (24) e u = u − ¯ u ( θ ) , (25)where ¯ x ( θ ) and ¯ u ( θ ) are steady state maps on the operat-ing point θ = ( N e , w fuel ), with N e as the engine speed and w fuel as the fueling rate. These maps have been previouslyobtained from a static calibration procedure as describedin Sankar et al. (2019). Thus, we can form an LPV modelin the trimmed state and inputs with dynamics e x k +1 = A ( θ ) e x k + B ( θ ) e u k + w k . (26)The operating space Θ is formed by box-constraints over θ (represented by high/low N e and w fuel ), and the outputsof interest for this system are y = [ p im y EGR ] ⊤ . Normal-isation of the states has been performed so that they arewithin the same order of magnitude. An initial dataset was collected from 16 experiments ateach of the operating points marked by the crosses inFigure 1. Each experiment constituted slightly over 6000samples in duration, and was designed with a multisineinput perturbation signal, due to slew rate considerationson the actuators.For our choice of ilm() in the framework, the local linearestimates and their corresponding standard errors were w w Fig. 1. Operating points at which experiments were con-ducted. Points labelled with a number indicates theorder in which the active learning experiment wasperformed beginning from the initial dataset.identified using generalised least squares for VARX regres-sion (L¨utkepohl, 2005). A GPR-LPV model is then fittedto these estimates. In our gpr() method, the covariancefunction we choose is the commonly-used squared expo-nential kernel: κ ( θ, θ ′ ) = σ exp (cid:20) −
12 ( θ − θ ′ ) ⊤ Λ − ( θ − θ ′ ) (cid:21) , (27)which is a justifiable choice by Assumption 2, since thiskernel produces smooth sample paths of the posteriorGaussian processes. The matrix Λ is a diagonal matrixof length-scales, which we decide upon using domainknowledge, since the relative magnitudes of the unitsused in the operating point variables θ = ( N e , w fuel ) areunderstood. The hyperparameter σ is chosen based on anempirical Bayes approach, where it is set to a factor of 2 ofthe maximum observed standard error for the respectiveparameter being fitted. As we suspect that A ( θ ) has alleigenvalues inside the unit disk, we place a simple priormean for A ( θ ) which is a constant diagonal matrix withall elements less than one in magnitude. The prior meanfor B ( θ ) is taken as a constant matrix of zeros.Figure 2 illustrates a GPR surface fitted to the a elementfrom the initial training dataset, along with 95% credibleintervals provided by the GPR and approximate 95%confidence intervals (2 standard errors) computed in theinitial estimates. We demonstrate the active learning framework for sequen-tial selection of operating points. The uncertainty criterion(as given by the sum of GPR-LPV variances in (12)) forthe GPR-LPV after the initial training dataset is displayedin Figure 4. To extend Algorithm 1 for sequential operatingpoint selection, we adopt a greedy approach, whereby the( m + 1) st operating point is chosen at the point of maxi-mum uncertainty after m experiments. We performed anadditional 19 experiments using active learning with thisgreedy approach, to append on top of the initial trainingdataset for the GPR-LPV. The order and the locations atwhich these experiments were conducted are indicated inFigure 1. Figures 4, 5 and 6 show the eventual reductionin variance over the operating space. The updated GPRsurface for the a element is presented in Figure 3.To assess the overall uncertainty of a GPR-LPV model M after a batch of experiments, we numerically evaluate Fig. 2. Initial fitted GPR surface for the a parameter.The GPR variance naturally increases the furtheraway from the data points. Where the GPR surfacelies above the particular data point; this is due tothe effect of the prior regularisation. With a differentselection of priors and also the hyperparameter Λ, acloser fit between the GPR estimate and the datapoint is possible.Fig. 3. Final fitted GPR surface for the a parameter afteractive learning. Compared to Figure 2, the surface ismore refined and the uncertainty intervals of the GPare narrower. Moreover by comparing the width of theGP 95% interval to the ± R Θ g M ( θ ) dθ . Figure 7 plotsthe uncertainty volume as each subsequent experiment isadded, and shows that using the active learning frame-work, most of the uncertainty can be reduced within thefirst few experiments.4. CONCLUSION & FUTURE WORKIn this paper, we contributed an active learning frameworkfor identifying LPV systems and demonstrated the successof the approach via a reduction in total uncertainty ofa GPR-LPV for a diesel-engine air-path. The ability toquantify the model uncertainty also provides benefit, suchas for when analysing performance of controllers designedusing the model. This work raises some interesting ad-ditional questions to follow-up on, such as how activelearning can be applied when Assumption 3 (full stateig. 4. Initial total uncertainty of GPR-LPV.Fig. 5. Total uncertainty of GPR-LPV after 5 experiments.The total uncertainty is reduced compared to Figure4.Fig. 6. Total uncertainty of GPR-LPV after 19 experi-ments. The total uncertainty is reduced compared toFigures 4 and 5.Fig. 7. Decrease in uncertainty volume R Θ g M ( θ ) dθ viaactive learning.measurement) is relaxed, and LPV models must be iden-tified from noisy input-output observations. Extensions toother classes of nonlinear systems may also be explored.These ideas will be investigated in future contributions.ACKNOWLEDGEMENTSThe authors would like to thank the engineering staff atToyota Motor Corporation Higashi-Fuji Technical Centre in Japan for their assistance in running the experimentsrelated to this work. REFERENCESAstrm, K.J. and Eykhoff, P. (1971). System identification- a survey. Automatica , 7(2), 123–162.Bilionis, I. and Zabaras, N. (2012). Multi-output lo-cal gaussian process regression: Applications to un-certainty quantification.
Journal of ComputationalPhysics , 231(17), 5718–5746.Boutahar, M. and Deniau, C. (1995). A proof of asymp-totic normality for some VARX models.
Metrika , 42(1),331–339.Brochu, E., de Freitas, N., and Ghosh, A. (2007). Ac-tive preference learning with discrete choice data. In
Advances in Neural Information Processing Systems .dos Santos, P.L., Perdicolis, T.P.A., Novara, C., Ramos,J.A., and Rivera, D.E. (eds.) (2012).
Linear Parameter-varying System Identification: New Developments andTrends . World Scientific Pub Co Inc.Goodwin, G.C. (1971). Optimal input signals fornonlinear-system identification.
Proceedings of the In-stitution of Electrical Engineers , 118(7), 922.Khalate, A.A., Bombois, X., T´oth, R., and Babuˆska, R.(2009). Optimal experimental design for LPV identifi-cation using a local approach. In . Elesvier.Levin, M. (1960). Optimum estimation of impulse responsein the presence of noise.
IRE Transactions on CircuitTheory , 7(1), 50–56.L¨utkepohl, H. (2005).
New Introduction to Multiple TimeSeries Analysis . Springer.MacKay, D.J.C. (1992). Information-based objective func-tions for active data selection.
Neural Computation ,4(4), 590–604.Motchon, K., Rajaoarisoa, L., Etienne, L., and Lecoeuche,S. (2018). On experiment design for local approachidentification of LPV systems. In . Elesvier.Rasmussen, C.E. and Williams, C.K.I. (2006).
GaussianProcesses for Machine Learning . MIT University Press.Sankar, G.S., Shekhar, R.C., Manzie, C., Sano, T., andNakada, H. (2019). Model predictive controller withaverage emissions constraints for diesel airpath.
ControlEngineering Practice , 90, 182–189.Settles, B. (2012).
Active Learning . Morgan & Claypool.Shekhar, R.C., Sankar, G.S., Manzie, C., and Nakada, H.(2017). Efficient calibration of real-time model-basedcontrollers for diesel engines — part i: Approach anddrive cycle results. In
IEEE 56th Annual Conference onDecision and Control . IEEE.Sung, C.L., Gramacy, R.B., and Haaland, B. (2018).Exploiting variance reduction potential in local gaussianprocess search.
Statistica Sinica .Toth, R. (2010).
Modeling and Identification of LinearParameter-Varying Systems . Springer.Wahlstrom, J. and Eriksson, L. (2011). Modelling dieselengines with a variable-geometry turbocharger and ex-haust gas recirculation by optimization of model param-eters for capturing non-linear system dynamics.