BBlind Identification via Lifting (cid:63)
Henrik Ohlsson ∗ , ∗∗ Lillian Ratliff ∗ Roy Dong ∗ S. Shankar Sastry ∗∗ Department of Electrical Engineering and Computer Sciences,University of California, Berkeley, CA, USA (e-mail:[email protected]). ∗∗ Division of Automatic Control, Department of ElectricalEngineering, Link¨oping University, Sweden.
Abstract:
Blind system identification is known to be an ill-posed problem and without furtherassumptions, no unique solution is at hand. In this contribution, we are concerned with the taskof identifying an ARX model from only output measurements. We phrase this as a constrainedrank minimization problem and present a relaxed convex formulation to approximate itssolution. To make the problem well posed we assume that the sought input lies in some knownlinear subspace.Keywords: System identification; Parameter estimation; Identification algorithms.1. INTRODUCTIONConsider an auto-regressive exogenous input (ARX) model y ( t ) − a y ( t − − · · · − a n a y ( t − n a )= b u ( t − n k −
1) + · · · + b n b u ( t − n k − n b ) (1)with input u ∈ R and output y ∈ R . Estimation ofthis type of model is one of the most common tasks insystem identification and a very well studied problem, seefor instance Ljung [1999]. The common setting is that { ( y ( t ) , u ( t )) } Nt =1 is given and the summed residuals N (cid:88) t = n (cid:32) y ( t ) − n b (cid:88) k =1 b k u ( t − k − n k ) − n a (cid:88) k =1 a k y ( t − k ) (cid:33) where n = max( n a , n k + n b ) + 1, is minimized to obtain anestimate for a , . . . , a n a , b , . . . , b n b . This estimate is oftenreferred to as the least squares (LS) estimate.In this paper we study the more complicated problem ofestimating an ARX model from solely outputs { y ( t ) } Nt =1 .This is an ill-posed problem and it is easy to see that underno further assumptions, it would be impossible to uniquelydetermine a , . . . , a n a , b , . . . , b n b . We will here thereforestudy this problem under the assumption that the stackedinputs belong to some known subspace. The input couldfor example be: • known to change only at a set of discrete times dueto a discrete controller or (cid:63) The work presented is supported by the NSF Graduate ResearchFellowship under grant DGE 1106400, NSF CPS:Large:ActionWebsaward number 0931843, TRUST (Team for Research in UbiquitousSecure Technology) which receives support from NSF (award numberCCF-0424422), and FORCES (Foundations Of Resilient CybEr-physical Systems), the Swedish Research Council in the Linnaeuscenter CADICS, the European Research Council under the advancedgrant LEARN, contract 267381, a postdoctoral grant from theSweden-America Foundation, donated by ASEA’s Fellowship Fund,and by a postdoctoral grant from the Swedish Research Council. • known to be band-limited and therefore well repre-sented by the projection on the first discrete Fouriertransform basis vectors.It should be noticed that this assumption is not enoughto uniquely determine the input or the ARX model.Specifically, we will not be able to decide the inputor the ARX coefficients b , . . . , b n b more than up to amultiplicative scalar. It should be stressed that this is not alimitation of the method that we propose but an inherentlimitation of the system identification problem since thesought quantities always appear as products. To uniquelydetermine the input and the ARX coefficients b , . . . , b n b ,further knowledge is needed.The main contribution of the paper is a novel method forARX model identification from only output measurements.The method takes the form of a convex optimizationproblem and gives a computationally flexible frameworkfor handling different types of measurement noises, con-straints, etc . 2. BACKGROUNDBlind system identifican (BSI) has a broad applicationarea and has been applied in fields such as data communi-cations, speech recognition and seismic signal processing,see for instance Abed-Meraim et al. [1997]. Common forthe type of modeling problems that BSI has been appliedto is that the input is difficult, costly or impossible tomeasure. In for example exploration seismology, the phys-ical properties of the earth are explored by studying theresponse of an excitation (often a charge of dynamite). Theexcitation is often difficult to measure and the modelingproblem therefore a BSI problem, see e.g., Zerva et al.[1999].Many methods have been proposed to solve the BSIproblem throughout the years. We give a short overviewhere but refer the interested reader to Abed-Meraim et al. a r X i v : . [ c s . S Y ] D ec { y ( t ) } Nt =1 ∈ R , find anestimate for a , . . . , a n a , b , . . . , b n b ∈ R and u ( t ) ∈ R , t =1 , . . . , N, such that y ( t ) − a y ( t − − · · · − a n a y ( t − n a )= b u ( t − n k −
1) + · · · + b n b u ( t − n k − n b ) + w ( t ) , for t = n, . . . , N , where n = max( n a , n k + n b ) + 1, and w ( t ) , t = n, . . . , N , some unknown zero mean noise. Wewill for simplicity assume that n a , n b , n k , are known. Tomake the problem well posed, we will seek an input in agiven subspace of R N .4. NOTATION AND ASSUMPTIONSWe will use y to denote the output and u the input. Wewill for simplicity only consider single input single output (SISO) systems, however with some extra bookkeepingalso MIMO systems could be treated. We will assume that N measurements of y are available and stack them in thevector y , i.e., y = [ y (1) . . . y ( N )] T . (2)We also introduce u , η , a and b as u = [ u (1) . . . u ( N )] T , (3) η = [ η (1) . . . η ( N )] T , (4) a = [ a . . . a n a ] T , (5) b = [ b . . . b n b ] T . (6)We will use y ( i ) to denote the i th element of y . To pick outa subvector of y consisting of the i th to the j th element wewill use the notation y ( i : j ) and similarly for picking out asubvector of u , a and b . To pick out a submatrix consistingof the i th to the j th rows of X we use the notation X ( i : j, :). We will use normal font to represent scalarsand bold for vectors and matrices. (cid:107) · (cid:107) is the zero (quasi)norm which returns the number of nonzero elements of itsargument and (cid:107) · (cid:107) ∗ the nuclear norm returning the sumof the singular values.We will assume that it is known that the sought input, u ,lies in some known subspace. We can hence write u = Dx (7)for some known N × m -matrix D and an unknown vector x ∈ R m . It is assumed that m ≤ N .5. BLIND IDENTIFICATION VIA LIFTINGConsider the noise free setting where w ( t ) = 0 , t = n, . . . , N. We can formulate the problem of finding an inputand the ARX coefficients as the feasibility problemfind u ( t ) , t = 1 , . . . , N,a , . . . , a n a , b , . . . , b n b (8a)subj. to y ( t ) − n a (cid:88) k =1 a k y ( t − k )= n b (cid:88) k =1 b k u ( t − n k − k ) , t = n, . . . , N. (8b)Note that the { a k } n a k =1 , { b k } n b k =1 , and { u ( t ) } Nt =1 are un-known. The problem is therefore non-convex.Introduce X = xb T ∈ R m × n b and note that (7) gives that DX = Dxb T = ub T . Since ub T contains all products u ( i ) b j , i = 1 , . . . , N, j = 1 , . . . , n b , the sum n b (cid:88) k =1 b k u ( t − n k − k ) (9)can be realized by summing appropriate entries of DX .Problem (8) can now be reformulated asnd X , a (10a)subj. to y ( t ) − n a (cid:88) k =1 a k y ( t − k )= n b (cid:88) k =1 ( DX )( t − n k − k , k ) , t = n, . . . , N, (10b) rank ( X ) = 1 . (10c)Note that we need to require that rank ( X ) = 1 to not losethe possibility to decompose X as X = xb T . The problem(10) is equivalent to (8) in the following sense. Assumethat (10), has a unique solution X ∗ , then X ∗ must satisfy X ∗ = x ∗ ( b ∗ ) T , with x ∗ and b ∗ solving (8). Extractingthe rank 1 component of X ∗ , using e.g., singular valuedecomposition, we can hence decide both x ∗ (and u ∗ = Dx ∗ ) and b ∗ up to a multiplicative scalar (note that wecan never do better with the information at hand, not evenif we would be able to solve (8)). The estimates of a will beidentical for both problems (if the estimates are unique).The technique of introducing the matrix X to avoidproducts between x and b is well known in optimizationand referred to as lifting [Shor, 1987, Lov´asz and Schrijver,1991, Nesterov, 1998, Goemans and Williamson, 1995].Problem (10) is a non-convex optimization problem andnot easier to solve than (8). To get an optimizationproblem we can solve, we remove the rank constraint andinstead minimize the rank. Since the rank of a matrix isnot a convex function, we replace the rank with a convexheuristic. Here we choose the nuclear norm, but otherheuristics are also available (see for instance Fazel et al.[2001]). We then obtain the convex programmin X , a (cid:107) X (cid:107) ∗ (11a)subj. to y ( t ) − n a (cid:88) k =1 a k y ( t − k )= n b (cid:88) k =1 ( DX )( t − n k − k , k ) , t = n, . . . , N, (11b)which we refer to as blind identification via lifting (BIL).Last, in the noisy setting we have to tolerate some nonzeromodeling error. If the noise e is known to be bounded, saythat | e ( t ) | ≤ (cid:15) , we suggest to usemin X , a ,η (cid:107) X (cid:107) ∗ (12a)subj. to y ( t ) − n a (cid:88) k =1 a k y ( t − k )= n b (cid:88) k =1 ( DX )( t − n k − k , k ) + η ( t ) , (12b) | η ( t ) | ≤ (cid:15), t = n, . . . , N, (12c)and if the noise is Gaussian, min X , a ,η (cid:107) X (cid:107) ∗ + λ (cid:107) η (cid:107) (13a)subj. to y ( t ) − n a (cid:88) k =1 a k y ( t − k )= n b (cid:88) k =1 ( DX )( t − n k − k , k ) + η ( t ) , (13b) t = n, . . . , N. (13c)In the latter case, we see λ > λ such that X becomes rank 1.6. ANALYSISThe number of optimization variables in (8) is essentially n a + n b + m , under the assumption that u = Dx . Wecan hence not expect a reliable identification result fromfewer than n a + n b + m measurements. One may wonderhow many measurements that are needed. Using that theconstraint (11b) of BIL is linear in X , we have the followingresult: Theorem 1. (Guaranteed Recovery using BIL).Consider the noise free blind ARX identification prob-lem (8) and assume that it has a unique solution (up to amultiplicative scalar). Let the row vector d i ∈ R m be the i :th row of D . If A = dn − nk − dn − nk − ... dn − nk − nb y ( n − ... y ( n − na ) dn − nk dn − nk − ... dn − nk − nb +1 y ( n ) ... y ( n − na +1) ... ... ... ... ... dn − nk − nb ... dn − nk − y ( n + nb − ... y ( n − na + nb − ... ... ... ... dN − nk − dN − nk − ... dN − nk − nb y ( N − ... y ( N − na ) has full column rank, then the ARX model and inputsolving (8) are recovered, up to a multiplicative scalar,by BIL. Proof.
From assumption, (8) has a unique solution. Form X ∗ by multiplying the solutions, x ∗ and b ∗ , of (8). That is, X ∗ = x ∗ ( b ∗ ) T . Let a ∗ , . . . , a ∗ n a denote the correspondingvalues for a , . . . , a n a that solve (8) and define θ ∗ as θ ∗ = (cid:2) X ∗ (: , T X ∗ (: , T . . . X ∗ (: , n b ) T − a ∗ . . . − a ∗ n a (cid:3) T . We must have that[ y ( n ) y ( n + 1) . . . y ( N )] T = A θ ∗ . (14)Note that the pair X ∗ and a ∗ is a feasible solution to (11).Since the solution to (14) must be unique since A has fullcolumn rank, we must have that BIL gives θ ∗ . (cid:50) Note that if the linear constraints of (11) alone give thesolution of BIL, no optimization is necessary. Seekingthe matrix X that gives the minimum nuclear norm isonly of interest if we have too few measurements forthe constraints to uniquely define the solution but moremeasurements than n a + n b + m .The noisy case is harder to analyze and we leave theanalysis as future work.7. COMPUTING λ min In the noisy version (13) of BIL, the design parameter λ has to be chosen. Since λ regulates the tradeoff betweenhe nuclear norm and the squared norm of the estimatednoise η , it is natural to seek the largest λ such that theestimate X is rank 1. In seeking this λ , the value for λ min may come handy. λ min is defined as the largest λ suchthat X = 0 in BIL. Since the estimate for X will stay thesame for all λ ≤ λ min , we should limit our search of λ to bewithin [ λ min ∞ ]. One may for example start with λ = λ min and then successively increase λ as long as rank ( X ) = 1. Theorem 2. (Computing λ min ).Consider the optimization problem given in (13). Thereexists a λ , denoted λ min , such that whenever λ ≤ λ min ,solving (13) results in X = 0. λ min is given by:1 /λ min = arg min V ∈ R m × nb (cid:107) V (cid:107) (15a)subj. to 0 = V ( i, j ) − N (cid:88) t = n (cid:18) y ( t ) − n a (cid:88) k =1 ˆ a k y ( t − k ) (cid:19) D ( t − n k − j, i ) , (15b) i = 1 , . . . , m, j = 1 , . . . , n b , (15c)with { ˆ a , . . . , ˆ a n a } = arg min a N (cid:88) t = n (cid:18) y ( t ) − n a (cid:88) k =1 a k y ( t − k ) (cid:19) . (16) Proof.
The noisy version of BIL can be rewritten to takethe formmin X , a (cid:107) X (cid:107) ∗ + λ N (cid:88) t = n (cid:18) y ( t ) − n a (cid:88) k =1 a k y ( t − k ) − n b (cid:88) k =1 ( DX )( t − n k − k , k ) (cid:19) . (17)The nuclear norm is not differentiable and it follows thatfor X = 0 to be a valid solution, zero needs to be in thesubdifferential of the objective with respect to X evaluatedat X = 0 (see e.g., Bertsekas et al. [2003, Prop. 4.7.2]).The subdifferential of the objective of (17) at X = 0 and a = ˆ a with respect to the ( i, j )th element of X can beshown equal to V ( i, j ) − λ N (cid:88) t = n (cid:18) y ( t ) − n a (cid:88) k =1 ˆ a k y ( t − k ) (cid:19) D ( t − n k − j, i ) . (18)We further have that (cid:107) V (cid:107) ≤ (cid:107) · (cid:107) is here the operator norm (the largestsingular value). To find λ min we could now consider theoptimization problem max V ∈ R m × nb ,λ λ (19a)subj. to 0 = V ( i, j ) − λ N (cid:88) t = n (cid:18) y ( t ) − n a (cid:88) k =1 ˆ a k y ( t − k ) (cid:19) D ( t − n k − j, i ) , (19b) i = 1 , . . . , m, j = 1 , . . . , n b , (19c) (cid:107) V (cid:107) ≤ , (19d)which can be shown equivalent to (15). (cid:50) λ min was also numerically verified.8. SOLUTION ALGORITHMS AND SOFTWAREMany standard methods of convex optimization can beused to solve problem (11), (12) and (13). Systems such asCVX [Grant and Boyd, 2010, 2008] or YALMIP [L¨ofberg,2004] can readily handle the nuclear norm. For large scaleproblems, the alternating direction method of multipliers (ADMM, see e.g., Bertsekas and Tsitsiklis [1997], Boydet al. [2011]) is an attractive choice and we have previ-ously shown that ADMM can be very efficient on similarproblems Ohlsson et al. [2013]. Code for solving (11), (12)and (13) will be made available on
9. NUMERICAL ILLUSTRATIONConsider the system given in the diagram below.
ZOH y ( t ) − a y ( t −
1) = b u ( t − b u ( t −
2) + b u ( t −
3) + e ( t ) ex u y Here the values x were generated by independently sam-pling from a unit Gaussian and the noise e by indepen-dently sampling from a uniform distribution between − (cid:15)/ (cid:15)/
2. The ZOH (zero-order hold) block holds the inputto the ARX system constant for 6 consecutive samples. Wecan therefore express u in terms of x as u = × × × . . . × × × × . . . × ... . . . ... × . . . × × × . . . × × x . (20)We identify the matrix in the relation between u and x as D . The ARX coefficients used were a = − . , b = 1 , b = 2 , b = 3 . (21)Figure 1 shows the output y for (cid:15) = 5.If the noisy version of BIL (12) is used to estimate u andan ARX model, we get the input-estimate given in Figure 2and the ARX coefficients: a = − . , b = 0 . , b = 1 . , b = 2 . . (22)It is interesting to notice that if we instead would be giventhe true input u and only estimated the ARX coefficients
50 100 150 200−20−15−10−5051015 t y Fig. 1. The noise free (solid line) and noisy outputs(circles). t u Fig. 2. The estimated (dashed line) and the true input u (solid line).by minimizing the squared residuals between the output y and the predicted output, we would have got the estimates: a = − . , b = 0 . , b = 2 . , b = 2 . . (23)As seen, these estimates are not that much better thanwhat BIL is providing (see (22)). Remember that BIL isonly given the y -measurements and not the inputs u . Itis therefore quite remarkable that the estimates of BIL iscomparable to those given in (23).To further study the robustness of BIL we carried out aMonte Carlo simulation. In the simulation, the noise level (cid:15) was varied between 0 and 5. For each noise level, 100 trialswere carried out with different noise and input realizations.The true ARX model was kept fixed (the same as above).The results are summarized in Figure 3.The setup of above example does not give that A has fullcolumn rank. Nevertheless, a perfect result was obtainedin the noise free case. It can however be verified that if D is instead generated by independently sampling each ε || ∆ b|| / || b |||| ∆ a|| / || a |||| ∆ u|| / || u || Fig. 3. The relative errors along with their 0.5 standarddeviation error bounds for varying noise levels.element from a unit Gaussian distribution (everything elseunchanged), the resulting A has full column rank.10. CONCLUSIONThis paper presented a novel framework for blind systemidentification of ARX model. The framework uses the factthat the problem can be rewritten as a rank minimizationproblem. A convex relaxation is presented to approximatethe sought ARX parameters and the unknown inputs.REFERENCESK. Abed-Meraim and E. Moulines. A maximum likelihoodsolution to blind identification of multichannel FIRfilters. In EUSIPCO , pages 1011–1015, 1994.K. Abed-Meraim, Wanzhi Qiu, and Y. Hua. Blind systemidentification.
Proceedings of the IEEE , 85(8):1310–1322, 1997.K. Abed-Meraim, P. Loubaton, and E. Moulines. A sub-space algorithm for certain blind identification prob-lems.
IEEE Transansactions on Information Theory ,43(2):499–511, September 2006.A. Ahmed, B. Recht, and J. Romberg. Blind deconvolu-tion using convex programming.
CoRR , abs/1211.5608,2012.E.-W. Bai and Minyue Fu. A blind approach to Hammer-stein model identification.
IEEE Transactions on SignalProcessing , 50(7):1610–1619, 2002.E.-W. Bai, Q. Li, and S. Dasgupta. Blind identifiability ofIIR systems.
Automatica , 38(1):181–184, 2010.D. P. Bertsekas and J. N. Tsitsiklis.
Parallel and Dis-tributed Computation: Numerical Methods . Athena Sci-entific, 1997.D. P. Bertsekas, A. Nedic, and A. E. Ozdaglar.
ConvexAnalysis and Optimization . Athena Scientific, 2003.S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein.Distributed optimization and statistical learning via thealternating direction method of multipliers.
Foundationsand Trends in Machine Learning , 2011.M. Fazel, H. Hindi, and S. P. Boyd. A rank minimizationheuristic with application to minimum order systempproximation. In
Proceedings of the 2001 AmericanControl Conference , pages 4734–4739, 2001.M. X. Goemans and D. P. Williamson. Improved approx-imation algorithms for maximum cut and satisfiabilityproblems using semidefinite programming.
J. ACM , 42(6):1115–1145, November 1995.M. Grant and S. Boyd. Graph implementations fornonsmooth convex programs. In V. D. Blondel, S. Boyd,and H. Kimura, editors,
Recent Advances in Learningand Control , Lecture Notes in Control and InformationSciences, pages 95–110. Springer-Verlag, 2008. http://stanford.edu/~boyd/graph_dcp.html .M. Grant and S. Boyd. CVX: Matlab software for dis-ciplined convex programming, version 1.21. http://cvxr.com/cvx , August 2010.Y. Hua. Blind methods of system identification.
Cir-cuits, Systems and Signal Processing , 21(1):91–108,2002. ISSN 0278-081X.L. Ljung.
System Identification — Theory for the User .Prentice-Hall, Upper Saddle River, N.J., 2nd edition,1999.J. L¨ofberg. YALMIP: A toolbox for modeling and opti-mization in MATLAB. In
Proceedings of the CACSDConference , Taipei, Taiwan, 2004. URL http://control.ee.ethz.ch/~joloef/yalmip.php .L. Lov´asz and A. Schrijver. Cones of matrices and set-functions and 0-1 optimization.
SIAM Journal onOptimization , 1:166–190, 1991.Y. Nesterov. Semidefinite relaxation and nonconvexquadratic optimization.
Optimization Methods & Soft-ware , 9:141–160, 1998.H. Ohlsson, A. Y. Yang, R. Dong, M. Verhaegen, and S. S.Sastry. Quadratic basis pursuit.
CoRR , abs/1301.7002,2013.B. Recht, M. Fazel, and P. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclearnorm minimization.
SIAM Review , 52(3):471–501, 2010.Y. Sato. A method of self-recovering equalization formultilevel amplitude-modulation systems.
IEEE Trans-actions on Communications , 23(6):679–682, 1975.N.Z. Shor. Quadratic optimization problems.
SovietJournal of Computer and Systems Sciences , 25:1–11,1987.L. Sun, W. Liu, and A. Sano. Identification of a dynamicalsystem with input nonlinearity.
IEE Proceedings –Control Theory and Applications , 146(1):41–51, 1999.L. Tong, G. Xu, and T. Kailath. A new approachto blind identification and equalization of multipathchannels. In , volume 2, pages 856–860, 1991.S. Van Vaerenbergh, J. V´ıa, and I. Santamar´ıa. Blindidentification of SIMO Wiener systems based on kernelcanonical correlation analysis.
IEEE Transactions onSignal Processing , 61(9):2219–2230, 2013.J. Wang, A. Sano, D. Shook, T. Chen, and B. Huang. Ablind approach to closed-loop identification of Hammer-stein systems.
International Journal of Control , 80(2):302–313, 2007.J. Wang, A. Sano, T. Chen, and B. Huang. Identification ofHammerstein systems without explicit parameterisationof non-linearity.
International Journal of Control , 82(5):937–952, 2009. J. Wang, A. Sano, T. Chen, and B. Huang. A blindapproach to identification of Hammerstein systems. InF. Giri and E.-W. Bai, editors,
Block-oriented Nonlin-ear System Identification , volume 404 of
Lecture Notesin Control and Information Sciences , pages 293–312.Springer London, 2010.G.A. Watson. Characterization of the subdifferential ofsome matrix norms.
Linear Algebra and its Applications ,170(0):33–45, 1992. ISSN 0024-3795.A. Zerva, A.P. Petropulu, and P.-Y. Bard. Blind deconvo-lution methodology for site-response evaluation exclu-sively from ground-surface seismic recordings.
Soil Dy-namics and Earthquake Engineering , 18(1):47–57, 1999.A. Zerva, A.P. Petropulu, and P.-Y. Bard. Blind systemidentification. In