[PDF] Subspace Identification of Large-Scale 1D Homogeneous Networks

Abstract

This paper considers the identification of large-scale 1D networks consisting of identical LTI dynamical systems. A new subspace identification method is developed that only uses local input-output information and does not rely on knowledge about the local state interaction. The identification of the local system matrices (up to a similarity transformation) is done via a low dimensional subspace retrieval step that enables the estimation of the Markov parameters of a locally lifted system. Using the estimated Markov parameters, the state-space realization of a single subsystem in the network is determined. The low dimensional subspace retrieval step exploits various key structural properties that are present in the data equation such as a low rank property and a {\em two-layer} Toeplitz structure in the data matrices constructed from products of the system matrices. For the estimation of the system matrices of a single subsystem, it is formulated as a structured low-rank matrix factorization problem. The effectiveness of the proposed identification method is demonstrated by a simulation example.

Full PDF

aa r X i v : . [ c s . S Y ] F e b Subspace Identiﬁcation of Large-Scale 1DHomogeneous Networks

Chengpu Yu, Michel Verhaegen, and Anders Hansson

Abstract —This paper considers the identiﬁcation of large-scale1D networks consisting of identical LTI dynamical systems. Anew subspace identiﬁcation method is developed that only useslocal input-output information and does not rely on knowledgeabout the local state interaction. The identiﬁcation of the localsystem matrices (up to a similarity transformation) is donevia a low dimensional subspace retrieval step that enablesthe estimation of the Markov parameters of a locally liftedsystem. Using the estimated Markov parameters, the state-spacerealization of a single subsystem in the network is determined.The low dimensional subspace retrieval step exploits various keystructural properties that are present in the data equation suchas a low rank property and a two-layer

Toeplitz structure in thedata matrices constructed from products of the system matrices.For the estimation of the system matrices of a single subsystem,it is formulated as a structured low-rank matrix factorizationproblem. The effectiveness of the proposed identiﬁcation methodis demonstrated by a simulation example.

Index Terms —Large-scale 1D distributed systems, nuclear-norm optimization, two-layer Toeplitz structure.

I. I

NTRODUCTION

In recent years, the fundamental system theoretical researchon large-scale interconnected systems has been intensiﬁed bothon the topic of distributed control, see e.g. [1]–[4], as wellas on the topic of distributed system identiﬁcation, see e.g.[5]–[7]. These fundamental developments are inspired by theincreasing interest from the application side, such as in ﬂuidmechanics [8], ﬂexible structures [9], large adaptive telescopemirrors [10], wind turbine farms [11], and so on. In this paper,we focus on the identiﬁcation of large-scale 1D homogeneousnetworked systems. Such systems may characterize a platoonof cars or may result by discretizing dynamical systemsdescribed via partial differential equations [3], [12], [13].When considering the subspace identiﬁcation of local sys-tem dynamics using local data only, there is not only themissing information about the local system state, but also themissing state information from the neighboring systems. Thismakes the local identiﬁcation problem more difﬁcult. In [14] asubspace identiﬁcation (SID) scheme for large-scale circulantsystems was developed. The speciﬁc property of circularity

C. Yu and M. Verhaegen are with the Delft Center for Systems andControl, Delft University, Delft 2628CD, Netherlands ([email protected],[email protected])A. Hansson is with the Division of Automatic Control, Department of Elec-trical Engineering, Linkoping University, Sweden ([email protected])Part of the research of M. Verhaegen was done while he was a visitingprofessor of the Division of Automatic Control, Department of ElectricalEngineering, Linkoping University, Sweden.The research leading to these results has received funding from theEuropean Research Council under the European Union’s Seventh FrameworkProgramme (FP7/2007-2013) / ERC grant agreement no. 339681. was exploited to decompose the large model description intosimple subsystems through a global transformation of theinput and output data. In [15] a network of identical systemsinteracting in a known interconnection pattern, indicated asso-called decomposable systems, was considered. The identi-ﬁcation of the dynamics of a single subsystem in the networkusing transformed input-output data sequences gives rise toa challenging Bilinear Matrix Inequality (BMI). In [12] theproperty that the inverse of the observability Grammian is off-diagonally decaying was used to approximate the neighboringstates, inﬂuencing the local system dynamics to be identiﬁed,via an (unknown) linear combination of locally neighboringinputs and outputs. The selection of these neighboring inputand output quantities requires however an exhaustive search,which is quite computationally demanding. As a comple-mentary work to [12], a nuclear norm identiﬁcation solutionwas provided in [16] to separate the local dynamics andglobal dynamics by exploiting their distinct rank and orderproperties; however, it did not consider identiﬁcation of theinterconnections between the subsystems.In this paper, we consider the subspace identiﬁcation ofa local cluster of identical LTI systems connected in a 1Dnetwork. Due to the unknown neighboring-state information,the considered local identiﬁcation is a blind identiﬁcationproblem, where some input sequences are totally unknown. Inthis scenario, the existing SID approaches that aim at retrievinga matrix whose row or column space is of interest, the latter,e.g., being the extended observability matrix, cannot work. Asa result, we aim to develop a new identiﬁcation method foridentifying the system matrices of a single subsystem (up to asimilarity transformation) in the network that uses local input-output data only.The subspace identiﬁcation method presented in this paperhas two major features.

First , a novel identiﬁcation methodis developed that uses similar data equations as many existingSID variants, but aims for a low dimensional subspace retrievalthat allows the identiﬁcation of the ﬁnite number of Markovparameters present in this data equation instead. This is a newalternative to classical SID methods [17] that would fail inretrieving subspace like the column space of the extendedobservability matrix of the system to be identiﬁed. This novelSID approach is applied to the identiﬁcation of a locally liftedsystem with the missing neighboring-state information. Byfully exploiting the two-layer structure of the block Toeplitzmatrices in the data equation of a lifted state-space system,a low-rank optimization problem is formulated for which theoptimal solution can yield (parts of) the true Markov param-eters of the locally lifted system.

Second is the retrieval of (cid:557) (cid:3036)(cid:2879)(cid:3019) (cid:1873) (cid:3036)(cid:2879)(cid:3019) (cid:4666)(cid:1863)(cid:4667) (cid:1877) (cid:3036)(cid:2879)(cid:3019) (cid:4666)(cid:1863)(cid:4667) (cid:557) (cid:3036) (cid:2878) (cid:3019) (cid:1873) (cid:3036) (cid:2878) (cid:3019) (cid:4666) (cid:1863) (cid:4667) (cid:1877) (cid:3036) (cid:2878) (cid:3019) (cid:4666) (cid:1863) (cid:4667) (cid:557) (cid:2869) (cid:1873) (cid:2869) (cid:4666) (cid:1863) (cid:4667) (cid:1877) (cid:2869) (cid:4666) (cid:1863) (cid:4667) (cid:557) (cid:3015) (cid:1873) (cid:3015) (cid:4666) (cid:1863) (cid:4667) (cid:1877) (cid:3015) (cid:4666) (cid:1863) (cid:4667) (cid:3) (cid:557) (cid:3036) (cid:1873) (cid:3036) (cid:4666) (cid:1863) (cid:4667) (cid:1877) (cid:3036) (cid:4666) (cid:1863) (cid:4667) (cid:1876) (cid:3036)(cid:2879)(cid:3019)(cid:2879)(cid:2869) (cid:4666)(cid:1863)(cid:4667) (cid:1876) (cid:3036)(cid:2878)(cid:3019)(cid:2878)(cid:2869) (cid:4666)(cid:1863)(cid:4667) (cid:17)(cid:135)(cid:139)(cid:137)(cid:138)(cid:132)(cid:145)(cid:148)(cid:138)(cid:145)(cid:145)(cid:134)(cid:3)(cid:145)(cid:136)(cid:3) (cid:557) (cid:2919) (cid:3)(cid:153)(cid:139)(cid:150)(cid:138)(cid:3)(cid:150)(cid:138)(cid:135)(cid:3)(cid:148)(cid:131)(cid:134)(cid:139)(cid:151)(cid:149)(cid:3) (cid:21) (cid:484)(cid:3)

Fig. 1. Illustration of a cluster of subsystems in a neighborhood of thesubsystem Σ i with radius R . The states x i − R − ( k ) and x i + R +1 ( k ) areexplicitly indicated. They are like all other states unmeasurable. the system matrices, describing the local LTI dynamics, of anindividual subsystem and its interaction with its neighboringsystems (up to a similarity transformation) from the reliablyestimated Markov parameters. The retrieval of the systemmatrices of an individual subsystem is inherently a challengingstructured state-space realization problem [18] for which theoptimal solution can yield the estimates of system matricesup to a similarity transformation. The second feature alsoincludes the exploitation of the shifting structure of a time-varying generalized observability matrix.The rest of the paper is organized as follows. Section IIdescribes the concerned identiﬁcation problems and shows thechallenge of dealing with the identiﬁcation of a small clusterof subsystems in a large-scale networked system. Section IIIshows that the existing subspace identiﬁcation methods breakdown in dealing with the concerned identiﬁcation problem.Section IV presents a method for identifying the Markovparameters of the locally lifted state-space models. Section Vprovides a solution to the state-space realization of a singlesubsystem. Section VI provides a numerical example. Theconclusions are provided in Section VII.The following notations are adopted throughout the paper.The lowercase (uppercase) x ( X ) is used to denote a vector(matrix). The superscripts T and − are transpose and inverseoperators, respectively. vec ( X ) represents the vector stackedby columns of X . tr ( X ) stands for the trace of X . I n denotesan n × n identity matrix and O m,n is an m × n zero matrix. ⊗ stands for the Kronecker product. k x k represents the norm of x . k X k F and k X k ∗ denote the Frobenius norm and nuclearnorm of the matrix X , respectively.II. P RELIMINARIES AND P ROBLEM D EFINITION

We consider the LTI systems { Σ i } Ni =1 connected in ahomogeneous 1D network as shown in Fig. 1. The lifted state-space model of the concerned network is written as: Σ : x ( k + 1) = Ax ( k ) + A r x ( k ) + Bu ( k ) y ( k ) = Cx ( k ) + e ( k )Σ i : x i ( k + 1) = Ax i ( k ) + A l x i − ( k ) + A r x i +1 ( k )+ Bu i ( k ) y i ( k ) = Cx i ( k ) + e i ( k ) i = 2 , · · · , N − , Σ N : x N ( k + 1) = Ax N ( k ) + A l x N − ( k ) + Bu N ( k ) y N ( k ) = Cx N ( k ) + e N ( k ) (1)where x i ( k ) ∈ R n × , u i ( k ) ∈ R m × , y i ( k ) ∈ R p × and e i ( k ) ∈ R p × are the state, input, output and measurement noise of the i -th subsystem, x i − ( k ) and x i +1 ( k ) are theneighboring states of the i -th subsystem. For x i ( k ) , thesubscript i denotes the spatial index and k is referred to as thetime index. For a large-scale distributed system, we alwaysassume that N ≫ n and n > max { p, m } .By lifting all states x i ( k ) into the vector x ( k ) as x ( k ) = (cid:2) x T ( k ) · · · x TN ( k ) (cid:3) T and doing the same for the inputs,outputs and noises deﬁning resp. the vectors u ( k ) , y ( k ) and e ( k ) , the global system in Fig. 1 has the following state spacemodel: x ( k + 1) = A x ( k ) + B u ( k ) y ( k ) = C x ( k ) + e ( k ) , (2)where A , B , C are N × N block matrices which have thefollowing forms: A =  A A r A l A . . .. . . . . . A r A l A  , B =  B B . . . B  , C =  C C . . . C  . To identify the system model in (2), the existing SID meth-ods only estimate the triplet ( A , B , C ) up to a similarity trans-formation , thereby not preserving the special block-diagonaland block-tridiagonal structure of these system matrices. In ad-dition, the computational complexity of SID methods, which isat least O ( N ) with N being the number of subsystems in thenetwork, may easily disqualify their use for the identiﬁcationof large-scale networks.To deal with the high computational complexity for i-dentifying the global system model in (2), we consider theidentiﬁcation of a cluster of subsystems { Σ j } i + Rj = i − R in aneighborhood of Σ i with radius R satisfying R < i < N − R and R ≪ N , as shown in Fig. 1. The lifted state-space modelof this cluster is represented as x i ( k + 1) = A R x i ( k ) + B R u i ( k ) + D R v i ( k ) ,y i ( k ) = C R x i ( k ) + e i ( k ) , (3)with x i ( k ) , u i ( k ) , y i ( k ) the subparts of x ( k ) , u ( k ) , y ( k ) re-spectively from the block-rows ( i − R ) to ( i + R ) ; A R , B R , C R are (2 R + 1) × (2 R + 1) block matrices which have formssimilar to A , B , C in (2), respectively; the (2 R + 1) × blockmatrix D R and the vector v i ( k ) are deﬁned as: D R =  A l

00 0 ... ... A r  , v i ( k ) = (cid:20) x i − R − ( k ) x i + R +1 ( k ) (cid:21) . For the identiﬁcation of the cluster model in (3), thefollowing two problems will be addressed.

Problem 1:

The Markov parameters of the locally liftedstate-space model in (3) will be identiﬁed using only the localinput u i ( k ) and the local output y i ( k ) .This problem will be addressed in Section IV. The difﬁcultyof this problem w.r.t. classical subspace identiﬁcation methodsin [19], [20] or their more recent variants [21], [22] is thatthe adjacent state sequences x i − R − ( k ) and x i + R +1 ( k ) aremissing or the input sequence v i ( k ) of the considered clustermodel is unknown. Problem 2:

The order the local subsystem Σ i , denotedby n , as well as the local system matrices A, A r , A l , B, C are to be identiﬁed based on the identiﬁed Markov param-eters in Problem 1. More speciﬁcally, the system matri-ces are to be identiﬁed up to a similarity transformation,i.e., the estimates ˆ A, ˆ A r , ˆ A l , ˆ B, ˆ C of A, A r , A l , B, C satisfy ˆ A = Q − AQ, ˆ A l = Q − A l Q, ˆ A r = Q − A r Q, ˆ B = Q − B, ˆ C = CQ with Q ∈ R n × n being a non-singularambiguity matrix.The second problem will be addressed in Section V. It isnoteworthy that, when the system order n is known, the sec-ond problem is inherently a structured state-space realizationproblem as studied in [18], which turns out to be a challengingnon-convex optimization problem.The developed method in this paper can be applied to moregeneral classes of large-scale identiﬁcation problems than justa 1D chain of identical LTI systems, such as 1D heterogeneousnetworks consisting of clusters of identical LTI dynamicalsystems [23].In addressing the locally lifted identiﬁcation problem in (3),we stipulate the following assumption. Assumption A.1

The global system ( A , B , C ) and the locallylifted system ( A R , B R , C R ) are assumed to be minimal.The persistent excitation of the input signal u ( k ) , whichwill be used for the identiﬁability analysis in the sequel, isdeﬁned below. Deﬁnition 1.

A time sequence u ( k ) ∈ R Nm is persistentlyexciting of order s if there exists an integer h such that the(block-) Hankel matrix  u ( k ) u ( k + 1) · · · u ( k + h − u ( k + 1) u ( k + 2) · · · u ( k + h ) ... ... . . . ...u ( k + s − u ( k + s ) · · · u ( k + s + h −  , has full row rank for any positive integer k . III. B

REAK DOWN OF EXISTING SUBSPACEIDENTIFICATION METHODS

The data equation of the local state-space model (3) is givenas follows: Y is,h = O s x ih + T B R s U is,h + T D R s V is,h + E is,h , (4)In this equation, the (block-) Hankel matrix Y is,h is deﬁned as Y is,h =  y i (1) · · · y i ( h ) ... . . . ...y i ( s ) · · · y i ( h + s −  with the superscript i being the spatial index of the subsystem Σ i , the subscripts s, h respectively being the number of blockrows and the number of block columns. Analogous to Y is,h ,we deﬁne the block-Hankel matrices U is,h , V is,h , E is,h from thesequences u i ( k ) , v i ( k ) , e i ( k ) , respectively. The matrix T B R s isa block Toeplitz matrix deﬁned from the triplet ( A R , B R , C R ) as T B R s =  C R B R ... . . . . . .C R A s − R B R · · · C R B R  , and T D R s is deﬁned in a similar way from the triplet ( A R , D R , C R ) . The ﬁnal matrix deﬁnitions in (4) are O s =  C R C R A R ...C R A s − R  and x ih = [ x i ( k ) · · · x i ( k + h − . Existing subspace identiﬁcation (SID) methods, see e.g.[17], break down when retrieving a matrix with a subspacelike O s or x ih from the available data matrices Y is,h and U is,h in (4) . To explain this deﬁciency of existing SID methods, weconsider the simple noise-free case e ( k ) ≡ . Let Π ⊥ U = I − U i,Ts,h (cid:16) U is,h U i,Ts,h (cid:17) − U is,h , Then the subspace revealing matrix [17] is Y is,h Π ⊥ U , or h Y i,Ts,h U i,Ts,h i T if we consider the approach in [22].When the unknown system input v i ( k ) in (3) is absent,namely v i ( k ) ≡ (or V ih,s ≡ ), the column space of O s can be retrieved from that of Y is,h Π ⊥ U , while the rowspace of x ih can be retrieved from that of compound matrix h Y i,Ts,h U i,Ts,h i T . However, for the case v i ( k ) = 0 , thepresence of the bilinear term T D R s V is,h destroys the importantsubspace revealing property in existing SID methods.Next, we shall show that the identiﬁcation method in [21]cannot handle the concerned identiﬁcation problem. One couldconsider the unknown input v i ( k ) in (3) as a ”missing inputvariable”. However, for the simple case e ( k ) ≡ and usingthe notation H v to denote the set block Hankel matrices ofthe same structure as the matrix V is,h , the rank minimizationproblem of [21] can be written as min V ∈H v rank  Y is,h U is,h V  . (5)By the following inequalityrank  Y is,h U is,h V  ≥ rank (cid:20) Y is,h U is,h (cid:21) , we can see that V = 0 is an optimal solution to (5). It impliesthat the whole missing input sequence, not just a few missingentries, cannot be recovered using the low-rank optimizationof [21].To resolve this deﬁciency of SID methods, when consideringthe identiﬁcation of local clusters in a large homogeneousnetwork, a novel SID approach is developed in this paper.This method differs from the existing SID methods in twomajor ways.First, the low-rank subspace retrieval step that is character-istic for many SID methods is unable to ﬁnd an accurate (orconsistent) estimate of the key subspace of O s or x ih . In thispaper, the low-rank optimization will be used for the accurateestimation of (parts of) the Markov parameters in the blockToeplitz matrix T B R s .Second, contrary to existing SID methods that are unableto preserve the structures in the estimated system matrices,the proposed subspace identiﬁcation method is capable ofaccurately estimating the non-zero block entries (up to asimilarity transformation) of the structured system matricesin (3), as stated in Problem 2.IV. I DENTIFYING THE M ARKOV PARAMETERS OF THELOCALLY LIFTED STATE SPACE MODEL (3)In this section, the Markov parameters of the state-spacemodel in (3) will be estimated by exploiting the low-rankproperty of the unmeasurable-state related terms in (4) andthe speciﬁc block Toeplitz structure of T B R s . Subsection IV-Aformulates a low rank optimization problem without consider-ing the speciﬁc structure of T B R s . It is shown in Lemma 2 thatthis low-rank optimization is unable to recover the true Markovparameters. Subsection IV-B reformulates another low-rankoptimization problem by incorporating the speciﬁc Toeplitzstructure of T B R s . It is shown in Theorem 1 that, under somemild conditions, the optimal solution to the proposed low-rank optimization problem can yield (parts of) the true Markovparameters. A. Formulation of a low-rank optimization problem

Due to the unmeasurable state sequences in a network, thematrix sum O s x ih + T D R s V is,h in (4) is unknown. However,this matrix sum has a low-rank property that will be exploitedas a solution in the new subspace identiﬁcation method. Lemma 1.

For the data equation (4) , when h > ps or Y is,h isa fat matrix, the sum O s x ih + T D R s V is,h satisﬁes the followingrank propertyrank (cid:0) O s x ih + T D R s V is,h (cid:1) ≤ (2 R + 1) n + min { ( s − sp, s − n } , (6) where ( s − sp and s − n denote the number of non-zerorows and columns of T D R s , respectively.Proof: From the structures of O s and T D R s , we canget that rank (cid:0) O s x ih (cid:1) ≤ rank ( O s ) ≤ (2 R + 1) n andrank (cid:16) T D R s V is,h (cid:17) ≤ rank (cid:0) T D R s (cid:1) ≤ min { ( s − sp, s − n } .Thus, the result in the lemma is straightforward. From Lemma 1, we can derive a condition to select theparameter s in the data equation (4) and the cluster radius R ,deﬁned above equation (3), such that the sum O s x ih + T D R s V is,h is of low rank (or rank deﬁcient). This condition reads: (2 R + 1) sp > (2 R + 1) n + min { ( s − sp, s − n } . (7)The above condition means that the number of the rows of thematrix O s x ih + T D R s V is,h is larger than an upper bound of itsrank. In practice, by ﬁxing a value of s satisfying that s > np ,we can always ﬁnd a value of R such that the above inequalityholds. Therefore, in the sequel, we assume that the matrix sum O s x ih + T D R s V is,h is of low-rank (or rank deﬁcient).Denote the noise-free output as ˆ y i ( k ) = y i ( k ) − e i ( k ) and itsrelated block Hankel matrix ˆ Y is,h . Based on the rank propertydiscussed above, a low-rank regularized optimization problemis then proposed as follows: min Θ BRs ∈T , ˆ Y is,h ∈H h + s − X t =1 k ˆ y i ( t ) − y i ( t ) k + λ · rank h ˆ Y is,h − Θ B R s U is,h i , (8)where T and H denote respectively the set of block Toeplitzand block Hankel matrices with appropriate block sizes, andthe regularization parameter λ allows to make a trade-offbetween the two terms in the cost function. It is remarkedthat the block Toeplitz structure of T corresponds to the ﬁrstblock-Toeplitz layer that is described in Subsection IV-B.In the absence of measurement noise, it will be shown inthe following lemma that the optimal solution to (8) is non-unique. Lemma 2.

Consider the optimization problem in (8) . Supposethat the following assumptions are satisﬁed: The global system input u ( k ) to (2) is persistentlyexciting of order N n + s with s being the SID dimensionparameter in (4) ; The measurement noise is absent, i.e., ˆ y i ( k ) = y i ( k ) ; The matrix pairs ( A l , B ) and ( A r , B ) have full rowrank.Then, the optimal solution to the following rank optimizationproblem is non-unique: min Θ BRs ∈T rank (cid:2) Y is,h − Θ B R s U is,h (cid:3) for R + s < i < N − R − s. (9)The proof to the above lemma is provided in Appendix A.The above lemma indicates that the Markov parameters, as theblock entries of T B R s , cannot be recovered even if the globaloptimal solution to the low-rank optimization problem in (9)can be found.In the next subsection, a solution will be provided by furtherconstraining the structure of block entries of T B R s , due to thespeciﬁc structures of the system matrices ( A R , B R , C R ) in(3). B. Structure constrained low-rank optimization

The block matrix T B R s in (4) has a two-layer block Toeplitzstructure, which will be utilized in the proposed identiﬁcation method. The ﬁrst layer is the block Toeplitz structure of T B R s with respect to its block entries C R A jR B R . The secondlayer is the partial block Toeplitz structure inside the blockentries C R A jR B R , as highlighted in the following exampleand lemma. Example 1.

If we take R = 3 and assume that each block in A R , B R , C R has size × , then a visual illustration of thestructures of the matrices { M j = C R A jR B R } j =1 is given inFigure 2. Fig. 2. Illustration of the partial Toeplitz structure of the matrices { M j = C R A jR B R } j =1 Top-left for j = 1 with block-bandwidth 1; top-right for j = 2 with block-bandwidth 2; bottom for j = 3 with block-bandwidth3. The deep blue color represents zero entries. The parts surrounded by redcurves have block Toeplitz structures. Lemma 3.

Based on the matrices A R , B R , C R deﬁned in (3) ,the following hold about the matrix product M j = C R A jR B R : M j is a banded block matrix, with block-bandwidth j . The submatrices of M j , for j < R + 1 , at the l − th block row and q − th block column with l, q ∈{ , · · · , R + 1 } , are inside the partial block-Toeplitzregion for the index-pair ( l, q ) satisfying: i + 1 ≤ l + q ≤ R + 3 − i. Proof:

Since B R and C R are block diagonal matriceswith constant diagonal blocks, M j has the same structurepattern as A jR . The matrix A R can be written as A R = I ⊗ A + J − ⊗ A l + J + ⊗ A r with J − and J + block-column shifting matrices such thatpost-multiplication of a matrix with J − results in shifting theblock-columns of that matrix to the left and adding a zeroblock column in the last column, while post-multiplication ofa matrix with J + results in the opposite shift and adding azero block column in the ﬁrst column.The proof is completed by induction. For j = 1 and j = 2 this can be easily checked. Let us assume that A jR has a blockToeplitz structure for the matrix blocks with the index-pair ( l, q ) satisfying that j + 1 ≤ l + q ≤ R + 3 − j , and using the above expression for A R , we can express A j +1 R as A j +1 R = A jR A R = A jR ( I ⊗ A + J − ⊗ A l + J + ⊗ A r ) . (10)By several trivial manipulations of the above matrix product,it can be derived that A j +1 R has a partial Toeplitz structurefor the matrix blocks with the index-pair ( l, q ) satisfying that ( j + 1) + 1 ≤ l + q ≤ R + 3 − ( j + 1) . This completes theproof.Let T f denote the set of two-layer block Toeplitz matricesthat was outlined above for the matrix T B R s . More clearly, thestructure of T B R s is illustrated in the following example. Example 2.

If we take R = 2 and s = 4 , the Toeplitz matrix T B R s can be parameterized in terms of the parameter set { ϕ j } j =1 with ϕ j ∈ R p × m as follows: T B R s =  M M M M M M  , (11) where M =  ϕ ϕ ϕ ϕ ϕ  , M =  ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ  , M =  ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ  . Then we specialize the rank-optimization problem in (8) as min Θ BRs ∈T f , ˆ Y is,h ∈H h + s − X t =1 k ˆ y i ( t ) − y i ( t ) k + λ · rank h ˆ Y is,h − Θ B R s U is,h i . (12)The difference between (8) and (12) is in the deﬁnition of theconstrained Toeplitz sets indicated by T and T f , respectively.Though it is only a minor notational difference, the impacton the solution is huge. As will be outlined in Theorem 1, itenables uniqueness of part of the solution.For the uniqueness property, use will be made of thefollowing time-varying observability matrix O j,k , which is asub-matrix of the extended observability matrix O j deﬁned in(4), consisting of the block rows corresponding to the second-layer block Toeplitz part of T B R s . Deﬁnition 2.

Let G j be a j × ( j + 2) block Toeplitz matrixof the form G j =  A l A A r A l A A r . . . . . . . . .A l A A r  . (13) Denote C j = I j ⊗ C . A time-varying observability matrix O j,k , for j > k − , is deﬁned in terms of the matrix pair ( C j , G j ) as [24, Chapter 3]: O j,k =  C j C j − G j − C j − G j − G j − ...C j − k − G j − k − · · · G j −  . Theorem 1.

Suppose that the following assumptions aresatisﬁed: Assumption A.1 holds and ν o is the observability indexof the pair ( A R , C R ) ; The cluster radius R of the lifted model in (3) and thedimension parameter s in (4) satisfy s > ν o , R ≥ s − The conditions 1)-3) of Lemma 2 hold; The time-varying observability matrix O R +1 ,s − , de-ﬁned in Deﬁnition

2, has full column rank.Then the submatrices of the Markov parameters M j = C R A jR B R , for j = 0 , , · · · , s − , contained in the matrix T B R s in (4) , for which Lemma 3 has shown that they preservethe second layer of block-Toeplitz structure, can be computedin a unique manner from the following low-rank optimizationproblem: min Θ BRs ∈T f rank h Y is,h − Θ B R s U is,h i for R + s < i < N − R − s. (14) The proof of the above theorem can be found in theAppendix B, where we can see that the (unique) global optimalsolution can yield correct recovery of certain parts of theMarkov parameters M j = C R A jR B R , for j = 0 , , · · · , s − .In addition, based on the conditions 2) and 4) of the abovetheorem, it can be established that the inequality (7) holds,namely the matrix sum O s x ih + T D R s V is,h is indeed of low-rank (or rank deﬁcient). This in turn indicates that the low-rankcost function in (12) is a reasonable choice.Lemma 2 and Theorem 1 demonstrate that the second-layerToeplitz structure, as pointed out in Lemma 3, is crucial forenforcing a (unique) solution to the identiﬁcation problem 1of this paper. The considered identiﬁcation of the Markovparameters boils down to solving the low-rank regularizedoptimization problem in (12).Since the optimization problem in (12) is non-convex,it is difﬁcult to obtain an optimal solution under a mildcomputational burden. In this paper, the reweighted nuclearnorm optimization method [25] is adopted, which is regardedas an iterative heuristic for the rank minimization problem(12).V. S TATE - SPACE REALIZATION OF A SINGLE SUBSYSTEM

In this section, we shall study the ﬁnal realization ofthe system matrices { C, A, A l , A r , B } from the estimatedsubmatrices of the Markov parameters M j = C R A jR B R , for j = 0 , , · · · , s − , contained in the matrix T B R s in (4), for which Lemma 3 has shown that they preserve the second-layerof block-Toeplitz structure. Remark 1.

The realization problem considered in this sectionis a structured state-space realization problem, which turns outto be a challenging non-convex optimization problem [18],[20], [26]. This problem is usually solved by the gradient-based optimization methods, which are sensitive to the selec-tion of the initial condition. Here, we transform the realizationproblem into a structured low-rank matrix factorization prob-lem for which the optimal solution enables the estimation ofsystem matrices ( C, A l , A, A r , B ) up to a similarity transfor-mation. This factorization problem is then reformulated intoan equivalent low-rank optimization problem, which is thennumerically solved using the optimization method developedin [27]. We start the solution by developing expressions of thesubmatrices in the second-layer Toeplitz regions in terms ofthe system matrices { C, A, A l , A r , B } . This is done in thefollowing Lemma. Lemma 4.

Consider the block matrices A R , B R and C R deﬁned in (3). Let the sequence of non-zero block entries fromleft to right of the ( j +1) -th block row of the matrix C R A jR B R be denoted as { F j, − j , F j, − j , · · · , F j,j − , F j,j } , then thesematrix entries satisfy the following relationship: j X k = − j F j,k z − k = C ( A l z − + A + A r z ) j B, (15) where z ∈ C .Proof: The above result can be derived using the ﬁlterbank theory in [28].As F j,k are the Markov parameters inside the second-layerblock Toeplitz part of T B R s , the values of F j,k for j ∈{ , , · · · , s − } , k ∈ {− j, · · · , j } are assumed to be availablein this section. Based on these values F j,k , we address theproblem of estimating the matrices { C, A, A l , A r , B } up to asimilarity transformation.Dual to Deﬁnition 2, we shall deﬁne a time-varying con-trollability matrix C j,k , which is a sub-matrix of the extendedcontrollability matrix determined by ( A R , B R ) . Deﬁnition 3.

Let Γ j be a ( j + 2) × j block Toeplitz matrix ofthe form Γ j =  A r A A r A l A . . .A l . . . A r . . . AA l  . (16) Denote B j = I j ⊗ B . A time-varying controllability matrix C j,k , for j > k − , is deﬁned in terms of the matrix pair (Γ j , B j ) : C j,k = [ B j Γ j − B j − · · · Γ j − · · · Γ j − k − B j − k − ] . In the sequel, we let s be an even integer such that s/ is an integer as well. The solution to the realization of thesystem matrices { C, A l , A, A r , B } is done in two phases.In the ﬁrst phase, the structured time-varying observabilitymatrix O R +1 ,s/ and the structured time-varying controlla-bility matrix C R +1 ,s/ are to be estimated from the availablematrix values F i,j . In the second phase, the system matri-ces { C, A, A l , A r , B } are derived from these time-varyingobservability and controllability matrices. It is remarked thatthe subscript s/ in O R +1 ,s/ (or C R +1 ,s/ ) means thatthe maximum moment of the block entries in O R +1 ,s/ (or C R +1 ,s/ ) is s/ − . The subscript s/ is adopted because thesum of the maximum moments of O R +1 ,s/ and C R +1 ,s/ isequal to the maximum moment, s − , of the available Markovparameters M j . A. Determining the time-varying observability and controlla-bility matrices

In this subsection, the determination of O R +1 ,s/ and C R +1 ,s/ will be formulated as a structured low-rank matrix-factorization problem. More importantly, we show that theoptimal solution to this matrix-factorization problem canyield the estimates of O R +1 ,s/ and C R +1 ,s/ up to ablock-diagonal ambiguity matrix with identical block-diagonalentries. This is crucial to identify the system matrices { C, A, A l , A r , B } up to a similarity transformation, as statedin Problem 2.First, by the deﬁnitions of O R +1 ,s/ and C R +1 ,s/ , wecan ﬁnd that the product of O R +1 ,s/ and C R +1 ,s/ is equalto a matrix constructed from { F j,k } jk = − j for j = 0 , , · · · , s − . This is demonstrated in the following example. Example 3.

When R = 1 and s = 4 , the product of O , and C , can be expressed as O , C , =  C C

00 0

CCA l CA CA r   B A r B B AB B A l B  =  F , F , F , F , F , F , − F , − F , F , F ,  . The above equation provides a simple example, showing thatthe product of O , and C , can be expressed in terms of { F j,k } jk = − j for j = 0 , , . Here, the product of O R +1 ,s/ and C R +1 ,s/ is represent-ed as: O R +1 ,s/ C R +1 ,s/ = H R +1 , R +1 , (17)where H R +1 , R +1 is a (2 R + 1) × (2 R + 1) block matrixthat is assumed to be known.Given the matrix H R +1 , R +1 , the problem of interest isto determine O R +1 ,s/ and C R +1 ,s/ from equation (17).According to Deﬁnitions 2 and 3, we can see that O R +1 ,s/ and C R +1 ,s/ are structured matrices. These structures areinstrumental to analyzing the properties of the optimal solution to (17), as shown in Theorem 2. In order to show thesestructures, use will be made of the following matrix deﬁnition.For j = 0 , , · · · , s/ and z ∈ C , the matrix sequences { W j,l } jl = − j and { E j,l } jl = − j are deﬁned as j X l = − j W j,l z − l = C ( A l z − + A + A r z ) jj X l = − j E j,l z − l = ( A l z − + A + A r z ) j B. (18)From the deﬁned quantities W j,l and E j,l in (18), the matrices W j and E j are deﬁned as W j =  W , W , − W , W , W , − ...W j − ,j −  E j =  E T , E T , − E T , E T , E T , − ...E Tj − ,j −  T . (19)It can be seen that O j,k (or C j,k ) is a block matrix constructedfrom the block components of W k (or E k ). This is illustratedin the next example. Example 4.

The time-varying observability matrix O , isrepresented in terms of W as follows O , =  W , W , W , W , W , W , − W , W , W , − W , W , W , − W , W , W , − W , − W , W , W ,  . Denote by O R +1 ,s/ the set of block matrices having thesame structure as O R +1 ,s/ , as illustrated in Example 4 , and C R +1 ,s/ the set of block matrices having the same structureas C R +1 ,s/ . It is remarked that the deﬁnitions of both thesets O R +1 ,s/ and C R +1 ,s/ require knowledge of the systemorder n . We then propose the following structured low-rankmatrix factorization problem: min O , C k H R +1 , R +1 − OC k F s.t. O ∈ O R +1 ,s/ , C ∈ C R +1 ,s/ . (20)It will be shown in the following theorem that the optimalsolution to (20) can yield the estimates of O or C up to ablock-diagonal ambiguity matrix with identical block-diagonalentries. Theorem 2.

Consider the optimization problem in (20) . Sup-pose that the following assumptions are satisﬁed: The values of R and s satisfy R ≥ s − , and s is apositive even integer; The system order of a subsystem, n , is known; The matrices O j,s/ and C j,s/ , for any j ≥ min { s − , R } , have full column and row rank, respectively; The matrix H R +1 , R +1 satisfying equation (17) isknown exactly.Then, any optimal solution pair { ˆ O , ˆ C } to the optimization (20) satisﬁes ˆ O = O R +1 ,s/ Q ˆ C = Q − C R +1 ,s/ , (21) where Q = I R +1 ⊗ Q with Q ∈ R n × n being a nonsingularambiguity matrix. The proof of this theorem is given in Appendix C.

B. Rank-constrained form of the optimization problem (20)In this subsection, the structured low-rank matrix factoriza-tion problem in (20) is reformulated into a rank-constrainedoptimization problem.From the matrix quantities deﬁned in (18) and (19), wecan see that O R +1 ,s/ is afﬁne in terms of W s/ , while C R +1 ,s/ is afﬁne in terms of E s/ . Then, it can be estab-lished that product O R +1 ,s/ C R +1 ,s/ is afﬁne in terms ofblock entries of W s/ E s/ [29]. This afﬁne operator H ( · ) isdeﬁned by O R +1 ,s/ C R +1 ,s/ = H ( W s/ E s/ ) . (22)In the sequel, the afﬁne operator H ( · ) is assumed to be known.Instead of O R +1 ,s/ and C R +1 ,s/ , we regard the product X = W s/ E s/ as the variable to be determined. Then, theafﬁne function H ( X ) works on the block entries of X . Bytaking into account the low-rank property of the matrix prod-uct W s/ E s/ , we propose the following rank-constrainedoptimization problem: min X k H R +1 , R +1 − H ( X ) k F s.t. rank ( X ) = n, (23)where n is the system order of a subsystem in (1) and X hasthe same size as W s/ E s/ .Next, we will show that the rank-constrained optimizationproblem in (23) is an equivalent formulation of (20). To showthis result, the following lemma is required. Lemma 5.

Suppose that O j,s/ and C j,s/ , for j > s − ,have full column and row rank, respectively. Then, W s/ and E s/ have full column and row rank, respectively, and bothranks are n .Proof: Since O j,s/ has full column rank, the blockvector h W T , , W T , − , · · · , W Ts/ , − s/ i T , which is constructedby stacking the non-zero block entries in the ﬁrst block columnof O j,s/ , has full column rank as well. Since this block vectoris a sub-part of W s/ , it can be derived that W s/ has fullcolumn rank. Similarly, it can be proven that the block vector E s/ has full row rank as well.The next theorem shows the equivalence between the opti-mization problems in (20) and (23). Theorem 3.

Consider the optimization problems in (20) and (23) . Suppose that all the conditions of Theorem 2 hold. Then,the optimization problems (20) and (23) are equivalent inthe sense that their optimal solutions can yield the estimate (cid:16) ˆ W s/ , ˆ E s/ (cid:17) of (cid:0) W s/ , E s/ (cid:1) up to an ambiguity matrix,i.e., there exist a nonsingular matrix Q ∈ R n × n such that ˆ W s/ = W s/ Q, ˆ E s/ = Q − E s/ . Proof: First, we will show that the optimal solutionto (20) can yield the estimate of (cid:0) W s/ , E s/ (cid:1) up to anambiguity matrix.From the matrix quantities in (18) and (19), we can seethat structured matrices O R +1 ,s/ and C R +1 ,s/ are linearlyparameterized by the block components of W s/ and E s/ ,respectively. By the result of Theorem 2, it can be derivedthat the optimal solution to (20) can yield the estimate of (cid:0) W s/ , E s/ (cid:1) up to an ambiguity matrix. Second, we will show that the optimal solution to (23) canyield the estimate of (cid:0) W s/ , E s/ (cid:1) up to an ambiguity matrix.By equation (22), the optimization problem (20) can bereformulated as min W , E k H R +1 , R +1 − H ( WE ) k F , (24)where the variables W and E have the same sizes of W s/ and E s/ , respectively. It is noted that equation (24) is a param-eterized form of the structured low-rank matrix factorizationproblem in (20).By Theorem 2 and Lemma 5, the optimal solution ( ˆ W , ˆ E ) to (24) satisﬁes rank (cid:16) ˆ W ˆ E (cid:17) = n and H ( ˆ W ˆ E ) = H R +1 , R +1 . It can be derived that X = ˆ W ˆ E is an optimalsolution to (23) and the criterion of (23) becomes zero.Let ˆ X be an optimal solution to (23). It should satisfy H ( ˆ X ) = H R +1 , R +1 and rank (cid:16) ˆ X (cid:17) = n . Let the SVDdecomposition of ˆ X be given by ˆ X = ˆ U Σ ˆ V T , where ˆ U and ˆ V are constructed by n orthogonal columns, and Σ ∈ R n × n is a diagonal matrix with positive diagonal entries. Then, ( ˆ W , ˆ E ) = ( ˆ U ,

Σ ˆ V T ) is an optimal solution to (24). Therefore, ( ˆ U ,

Σ ˆ V T ) is an estimate of (cid:0) W s/ , E s/ (cid:1) up to an ambiguitymatrix. The proof is then completed.The rank-constrained optimization problem in (23) is non-convex and NP-hard. Following our previous work [27], therank-constrained optimization problem (23) is recast intoa difference-of-convex optimization problem which is thensolved using the sequential convex programming method.Let ˆ X be an optimal solution to (23). The SVD decompo-sition of ˆ X is given by ˆ X = (cid:2) U U (cid:3) (cid:20) Σ Σ (cid:21) (cid:20) V T V T (cid:21) , (25)where U and V consists of n orthogonal columns, and thediagonal matrix Σ ∈ R n × n has larger diagonal entries than Σ . The estimates of W s/ and E s/ can then be obtained asfollows: ˆ W s/ = U , ˆ E s/ = Σ V T . Since O R +1 ,s/ and C R +1 ,s/ are respectively afﬁne interms of W s/ and E s/ , the variables O and C in (20) canbe estimated from ˆ W s/ and ˆ E s/ , respectively. C. Determining the system matrices { A, A l , A r , B, C } In view of the theoretical result in Theorem 2, we assumethat the obtained estimates ˆ O R +1 ,s/ and ˆ C R +1 ,s/ satisfy ˆ O R +1 ,s/ = O R +1 ,s/ Q , ˆ C R +1 ,s/ = Q − C R +1 ,s/ , (26)where Q = I R +1 ⊗ Q with Q ∈ R n × n being nonsingular.Based on these estimates, we will address the identiﬁcationof the system matrices { A, A l , A r , B, C } up to a similaritytransformation.First, the shifting structure of the time-varying observabilitymatrix O R +1 ,s/ will be explored. Denote O j,k : k =  C j − k G j − k · · · G j − C j − k +1) G j − k +1) · · · G j − ...C j − k G j − k · · · G j −  where ≤ k < k ≤ s/ − and k ≤ j ≤ R + 1 . Thematrix O j,k : k above is constructed by the block rows of O j,k with block-row indices from k to k . Then, the structure-shifting property of O R +1 ,s/ can be represented as O R − , s/ − G R − = O R +1 , s/ − , (27)where O R − , s/ − and O R +1 , s/ − are sub-matrices of O R +1 ,s/ , and G R − is a block Toeplitz matrix deﬁned inDeﬁnition 2. This is illustrated in the following example. Example 5.

When R = 2 and s = 6 , the structure-shiftingproperty in (27) can be explicitly written as  C C CCA l CA CA r   A l A A r A l A A r A l A A r  =  CA l CA CA r CA l CA CA r CA l CA CA r CA l C ( AA l + A l A ) ∗ C ( AA r + A r A ) CA r  , where ∗ is used to represent the term C ( A l A r + A + A r A l ) . Based on equation (27), we formulate the following struc-tured least-squares optimization problem to identify the ma-trices A l , A, A r based on the estimate ˆ O R +1 ,s/ : min G k ˆ O R − , s/ − G − ˆ O R +1 , s/ − k F s.t. G ∈ G R − , (28)where G R − denotes a set of matrices having the samestructure as G R − , as shown in Deﬁnition 2.The optimal solution to (28) has properties shown in thefollowing lemma. Lemma 6.

Let ˆ O R +1 ,s/ satisfy equation (26) . Assumethat ˆ O R − , s/ − has full column rank. Then, the optimalsolution ˆ G to the optimization problem in (28) satisﬁes ˆ G = (cid:0) I R − ⊗ Q − (cid:1) G R − ( I R +1 ⊗ Q ) , (29) where Q ∈ R n × n is a nonsingular matrix. The above lemma can be derived straightforwardly basedon equation (26) and the optimization formulation (28).The condition that the time-varying observability matrix ˆ O R − , s/ − has full column rank is similar to condition 3)of Theorem 2. Lemma 6 implies that the matrices A l , A, A r can be determined up to a similarity transformation, i.e., ˆ A l = Q − A l Q, ˆ A = Q − AQ, ˆ A r = Q − A r Q. In addition, according to equation (26), the estimates ˆ C and ˆ B can be extracted respectively from ˆ O R +1 ,s/ and ˆ C R +1 ,s/ ,satisfying that ˆ C = CQ, ˆ B = Q − B. To ease the reference, the proposed local network identiﬁ-cation algorithm is summarized in Algorithm 1.

Algorithm 1 : Local identiﬁcation for 1D distributed systemsStep 1 Construct a spatially stacked state-space model (3) and itstemperally stacked equation (4) based on local observations;Step 2 Estimate T B R s from the optimization problem (12);Step 4 Estimate O R +1 ,s/ and C R +1 ,s/ usingthe method described in Subsection V-B;Step 5 Extract the estimates of C and B from the estimates of O R +1 ,s/ and C R +1 ,s/ , respectively;Step 6 Estimate A l , A, A r by solving (28). VI. N

UMERICAL SIMULATION

In this section, numerical simulations are provided todemonstrate the effectiveness of the proposed identiﬁcationmethod – Algorithm 1. In the simulation, the distributedsystem is constructed by connecting 40 identical subsystemsin a line, and the identiﬁcation for the 20-th subsystemis performed. The system matrices ( A, A l , A r , B, C ) with A, A l , A r ∈ R × , B ∈ R × and C ∈ R × are randomlygenerated such that Assumption A.1 is satisﬁed and the 1Dnetworked system is stable.To construct an augmented state-space model in (3), we set R = 5 and s = 8 . The system input and the measurement noiseare generated as white noise sequences and the data length isset to 800.To evaluate the identiﬁcation performance against the noiseeffect, the criterion signal-to-noise ratio (SNR) is adopted,which is deﬁned asSNR = 10 log (cid:18) var ( y i ( k ) − e i ( k )) var ( e i ( k )) (cid:19) . In the sequel, we shall carry out numerical simulations withthe SNR ranging from 0 dB to 95 dB.We use the criterion impulse-response ﬁtting to evaluatethe performance of the proposed identiﬁcation method. The normalized ﬁtting error of the impulse-response sequence CA i B is deﬁned by T T X j =1 P i =0 k CA i B − ˆ C j ˆ A ij ˆ B j k F P i =0 k CA i B k F , where T is the number of randomly generated networkedsystems and { ˆ C j , ˆ A j , ˆ B j } are the estimates of { C, A, B } ofthe j -th generated networked system, respectively. Similarly,we can also deﬁne the normalized ﬁtting error for the impulse-response sequences CA il B and CA ir B .Fig. 3 shows the identiﬁcation performance of the proposedmethod against the SNR. The normalized ﬁtting errors arecalculated by averaging the results of 100 randomly generatednetworked systems, and the regularization parameter λ in (8)is set to λ = 10 − . It can be observed from Fig. 3 that, usingthe proposed identiﬁcation method, the normalized ﬁtting errordecreases along with the increase of the SNR. When the SNRis larger than 50 dB, the normalized ﬁtting errors can be assmall as − . SNR (dB) -4 -3 -2 -1 N o r m a li z ed f i tt i ng e rr o r Fig. 3. Normalized ﬁtting errors under different noise levels with λ = 10 − . Fig. 4 shows the impulse-response ﬁdelity of individualsubsystems against the regularization parameter λ , whereSNR is set to SNR=40 dB. The normalized ﬁtting errors arecomputed by averaging the results of 100 randomly generatednetworked systems. We can observe from Fig. 4 that goodidentiﬁcation performance can be achieved by choosing theregularization parameter λ around − .VII. C ONCLUSION

The local identiﬁcation of 1-D large-scale distributed sys-tems has been studied. Compared with the classical systemidentiﬁcation problems, the challenging point of the localsystem identiﬁcation is that there are two unknown systeminputs which are the states of its neighboring subsystems. Byexploiting both the spatial and temporal structures of the dis-tributed system, especially the two-layer Toeplitz structure ofthe Markov-parameter matrix, a low-rank optimization prob-lem has been provided for identifying the Markov parametersof a local cluster of identical subsystems, where the associatedoptimal solution can yield (parts of) the true Markov param-eters. Moreover, the system realization of a structured state-space model is formulated as a structured low-rank matrix -8 -6 -4 -2 N o r m a li z ed f i tt i ng e rr o r Fig. 4. Normalized ﬁtting errors against the regularization parameter λ withSNR=40 dB. factorization problem, showing that the system matrices canbe determined up to a similarity transformation by enforcingthe structure of the generalized observability/controllabilitymatrix.Although we only consider the identiﬁcation of 1D homo-geneous networked systems, it can be extended to 2D homo-geneous networks using the same identiﬁcation framework,namely exploiting both the spatial and temporal structuresof the concerned distributed system. In our future work,the identiﬁcation of large heterogeneous networks will beinvestigated. A PPENDIX AP ROOF OF L EMMA

Step I : we shall show that  x ih V is,h U is,h  has full row rank underthe given assumptions in the lemma.It can be derived from (2) that  x ( k ) · · · x ( k + 1) · · · ... · · · x ( k + s − · · ·  =  I A B ... ... . . . A s − A s − B · · · B  ×  x ( k ) · · · u ( k ) · · · ... · · · u ( k + s − · · ·  . (30)Let Ξ = [ O (2 R +1) n, ( i − R − n I (2 R +1) n O (2 R +1) n, ( N − i − R ) n ] , Ω = h O n, ( i − R − n I n O n, (2 R +1) n O n,n O n, ( N − i − R − n O n, ( i − R − n O n,n O n, (2 R +1) n I n O n, ( N − i − R − n i and Π = [ O (2 R +1) m, ( i − R − m I (2 R +1) m O (2 R +1) m, ( N − i − R ) m ] be selection matrices such that  x ih V is,h U is,h  =  ΞΩΩ A Ω B ... ... . . . Ω A s − Ω A s − B · · · Ω B

00 Π ... . . .  ×  x ( k ) · · · u ( k ) · · · ... · · · u ( k + s − · · · u ( k + s − · · ·  . (31)By Lemma 10.4 in [20], under the assumption that u ( k ) ispersistently exciting of order nN + s , it can be establishedthat  x ( k ) · · · u ( k ) · · · ... · · · u ( k + s − · · ·  has full row rank. In addition, by explicitly unfolding theexpressions of Ω A j B , it can be veriﬁed that, when ( A l , B ) and ( A r , B ) have full row rank and R ≥ s − , the coefﬁcientmatrix on the right-hand side of (31) has full row rank. Forthe sake of brevity, we only show this for the case with s = 3 . The associated coefﬁcient matrix in (31) is shownin (32) with ∗ denoting unimportant entries for analyzing theconcerned rank property. Under the condition that ( A l , B ) and ( A r , B ) have full row rank, it is easy to see that the sub-matrix stacked by the ﬁrst ﬁve and last two block rows of(32) has full row rank. It can be further veriﬁed that the wholecoefﬁcient matrix in (32) has full row rank. This is a conse-quence of (cid:2) A l A l B B (cid:3) = (cid:2) A l B (cid:3) (cid:20) A l B

00 0 I (cid:21) and (cid:2) A r A r B B (cid:3) = (cid:2) A r B (cid:3) (cid:20) A r B

00 0 I (cid:21) with (cid:2) A l B (cid:3) and (cid:2) A r B (cid:3) having full row rank.Now that the coefﬁcient matrix in (31) has full row rank,Sylvesters’ inequality shows that  x ih V is,h U is,h  has full row rankas well. Step II:

Let T B R s ∈ T be the true value of Θ B R s . Next, weshall show that T B R s is an optimal solution to (9). Let ∆ s ∈ T .Under the full row rank property of  x ih V is,h U is,h  , we can derivethat rank (cid:2) Y is,h − T B R s U is,h (cid:3) = rank (cid:18)(cid:2) O s T D R s (cid:3) (cid:20) x ih V is,h (cid:21)(cid:19) = rank (cid:2) O s T D R s (cid:3) and rank (cid:2) Y is,h − (cid:0) T B R s − ∆ s (cid:1) U is,h (cid:3) = rank (cid:2) O s T D R s ∆ s (cid:3)  x ih V is,h U is,h  = rank (cid:2) O s T D R s ∆ s (cid:3) . (33)By the inequalityrank (cid:2) O s T D R s (cid:3) ≤ rank (cid:2) O s T D R s ∆ s (cid:3) , (34)we can obtain thatrank (cid:2) Y is,h − T B R s U is,h (cid:3) ≤ rank (cid:2) Y is,h − (cid:0) T B R s − ∆ s (cid:1) U is,h (cid:3) for any ∆ s ∈ T . Thus, T B R s is an optimal solution to (9). Step III:

In this step, we shall show that the optimal solutionto (9) is non-unique. Let ∆ s = T D R s G s , (35)where G s = I s ⊗ G R with G R ∈ R n × (2 R +1) n . Denote T B R s = T B R s − ∆ s . It is easy to verify that T B R s ∈ T .Since ∆ s = T D R s G s , it can be established thatrank (cid:0) [ Y ik,s,h − T B R s U ik,s,h (cid:1) = rank (cid:2) O s T D R s ∆ s (cid:3) = rank (cid:18)(cid:2) O s T D R s (cid:3) (cid:20) I I G s (cid:21)(cid:19) = rank (cid:2) O s T D R s (cid:3) = rank (cid:0) [ Y is,h − T B R s U is,h (cid:1) . (36)It can then be observed from the above equation that both T B R s and T B R s are optimal solutions to (9); therefore, the optimalsolution to (9) is non-unique.A PPENDIX BP ROOF OF T HEOREM  x ih V is,h U is,h  . Let T f denote the set of the two-layer block Toeplitzmatrices, deﬁned in Subsection IV-B. Let T B R s ∈ T f be thetrue value of the estimate of Θ B R s in (14). Since the matrix set T f belongs to the matrix set T , the minimal value of the costfunction in (9) is smaller than or equal to that in (14), i.e. min Θ BRs ∈T rank (cid:2) Y is,h − Θ B R s U is,h (cid:3) ≤ min Θ BRs ∈T f rank (cid:2) Y is,h − Θ B R s U is,h (cid:3) . As shown in Lemma 2, the true value T B R s ∈ T f is anoptimal solution to (9). Then, it can be observed from theabove inequality that T B R s ∈ T f is also an optimal solution to  | I n ∗ I n | | | | | I n | ∗ | | | | | | ∗ | I n | | | | A l A | A r ∗ | B | ∗ | | | | ∗ A l | A A r | ∗ | B | | A l ∗ ∗ | ∗ ∗ | A l B AB A r B | ∗ | B | ∗ | | ∗ ∗ | ∗ ∗ A r | ∗ | A l B AB A r B | ∗ | B | | | I (2 R +1) m | | || | | | | I (2 R +1) m |  (32) (14). As a consequence, the corresponding minimal value ofthe cost function of (14) isrank (cid:2) O s T D R s (cid:3) . Let T B R s ∈ T f be such that ∆ s = T B R s − T B R s is a perturbationwe seek of the true value such that it retains the minimal valueof the cost function in (14). Using the full row rank conditionof the matrix in (31), we obtain thatrank (cid:0) Y is,h − T B R s U is,h (cid:1) = rank (cid:2) O s x ih + T D R s V is,h + ∆ s U is,h (cid:3) = rank (cid:2) O s T D R s ∆ s (cid:3) . (37)Now, we seek that part of ∆ s ∈ T f that retains the minimalvalue of the cost function in (14) only when it is zero. For thatreason, we permute the block rows of ∆ s compatible with thezero block-row pattern of the matrix T D R s . (a) Structure of T D R s : The structure of T D R s is determinedby C R A jR D R for j = 0 , , · · · , s − . By the deﬁnitions of A R , C R and D R in (3), we can ﬁnd that C R A jR D R is a (2 R +1) × block matrix. When R ≥ ( s − , the block rows of C R A jR D R indexed from ( j + 2) to (2 R − j ) are zero. Forinstance, C R A jR D R has the block structure as shown belowwith ∗ representing non-zero block entries:  ∗ ... ∗

00 00 ∗ ... ∗  . (38) (b) Permutation of ∆ s : Let L s denote the matrix thatpermutes all zero block rows of T D R s to top such that L s T D R s = (cid:20) T (cid:21) , (39)where T is a sub-matrix of T D R s by removing the correspond-ing zero block rows, as illustrated in (38). It is noteworthy thatthe zero block rows of T D R s correspond to the second-layerblock Toeplitz part of T B R s , as shown in Lemma 3.Then, the permutation matrix L s is applied to the matrices ∆ s and O s , yielding the following special structures L s ∆ s = (cid:20) ∆ ∗ (cid:21) =  , , ∆ , ... . . . . . . . . . ∆ s − , · · · ∆ ,s − ∆ ,s − ∗  ,L s O s = (cid:20) O s ∗ (cid:21) , (40)such that O s = O R +1 ,s and ∆ j,l , for j = 0 , , · · · , s − and l = 0 , · · · , s − − j , is a [2( R − j − l ) − × [2 R + 1] blockToeplitz matrix in the form ∆ j,l =  δ j, − j δ j, − j · · · δ j,j . . . . . . . . . . . . . . . . . . δ j, − j δ j, − j · · · δ j,j  , with { δ j,k } jk = − j being block entries of appropriate sizes. Notethat ∆ j,l for l ≥ is a sub-matrix of ∆ j, , which is constructedby stacking the block rows indexed from ( l +1) to (2 R − − l ) .Now, we are ready to show for what values of ∆ j,l thefollowing rank constraint holdsrank (cid:2) O s T D R s ∆ s (cid:3) = rank (cid:2) O s T D R s (cid:3) . (41)It can be derived from the above equality that L s (cid:2) O s T D R s (cid:3) (cid:20) Q Q (cid:21) = L s ∆ s , (42)or (cid:20) O R +1 ,s ∗ T (cid:21) (cid:20) Q Q (cid:21) = (cid:20) ∆ ∗ (cid:21) , (43)where Q and Q are coefﬁcient matrices of appropriate sizes,and ∗ denotes non-zero block entries caused by the block-rowpermutation. So we focus on the top part of (43) and rewriteit as ∆ = O R +1 ,s (cid:2) Q , Q , · · · Q ,s − (cid:3) , (44)where { Q ,j } s − j =0 are sub-matrices of Q satisfying that Q =[ Q , Q , · · · Q ,s − ] .By considering the last block column of equation (44),under the assumption that O R +1 ,s − has full column rank, wecan obtain that Q ,s − = 0 . Similarly, by considering the sec-ond to the last block column, it can be derived that Q ,s − = 0 and ∆ ,s − = 0 , further ∆ ,j = 0 for j = 0 , , · · · , s − .Next, for the third to the last block column, since ∆ ,s − = 0 and O R +1 ,s − has full column rank, we can obtain that Q ,s − = 0 and ∆ ,j = 0 for j = 0 , · · · , s − . By repeatingthis procedure, it can be established that equation (44) holdsonly when ∆ = 0 and Q ,j = 0 for j = 0 , , · · · , s − . Inother words, the matrix ∆ s with a nontrivial sub-matrix ∆ cannot be linearly represented by (cid:2) O s T D R s (cid:3) .Since the block rows of ∆ correspond to the second-layer Toeplitz part of ∆ s , we can see that the rankof (cid:2) O s T D R s ∆ s (cid:3) is strictly larger than that of (cid:2) O s T D R s (cid:3) as long as the second-layer Toeplitz part of ∆ s is non-zero. Thus, the proof is completed.A PPENDIX CP ROOF OF T HEOREM { ˆ O R +1 ,s/ , ˆ C R +1 ,s/ } is oneof the optimal solution pairs to (20), it can be established that ˆ O R +1 ,s/ and O R +1 ,s/ has the same structure and the samecolumn space, which can be algebraically represented as ˆ O R +1 ,s/ = O R +1 ,s/ Q , (45)where Q ∈ R (2 R +1) n × (2 R +1) n is a (2 R + 1 , R + 1) blockmatrix which is nonsingular. In the sequel, we denote Q i,j the ( i, j ) -th block entry of Q . In order to prove the theorem, it issufﬁcient to prove that the estimate ˆ W s/ of W s/ satisﬁesthat ˆ W s/ = W s/ Q, (46)where Q ∈ R n × n is a nonsingular matrix and W s/ is deﬁnedin (19). In the sequel, we denote by ˆ W i,j , for j = − i, · · · , i ,the estimate of W i,j which is a block component of W s/ .To illustrate the structure of equation (45), it is expandedinto equation (47) by setting R = 2 and s = 4 .The following proof will be divided into three steps. Step 1.

We will show that, under the assumption that O R,s/ has full column rank, the following relations can bederived from equation (45): i = 0 : ˆ W i, − i = W i, − i Q , ,W i, − i Q ,j = 0 for j = 2 , · · · , s − i = 1 : ˆ W i, − i = W i, − i Q , , ˆ W i, − i = W i, − i Q , + W i, − i Q , , ˆ W i, − i = W i, − i Q , + W i, − i Q , + W i, − i Q , ,W i, − i Q ,j +2 + W i, − i Q ,j +1 + W i, − i Q ,j = 0 for j = 2 , · · · , s − · · · i = s/ − W i, − i = W i, − i Q , , · · · ˆ W i,s − − i = W i, − i Q ,s − + · · · + W i,s − − i Q , . (48) Let’s consider the equation corresponding to the ﬁrst block-column of (45). Since O R,s/ has full column rank, it can bederived that Q i, = 0 for i = 2 , , · · · , R + 1;ˆ W i, − i = W i, − i Q , for i = 0 , , · · · , s/ − . (49) To illustrate this, the ﬁrst block-column equation of (47) canbe equivalently written as follows:  ˆ C ˆ CA l  =  C CA l CA CA r C C C

00 0 0 0 C CA l CA CA r

00 0 CA l CA CA r   Q , Q , Q , Q , Q ,  . It is obvious that, since O , (or O R,s/ ) has full columnrank, it has Q , = Q , = Q , = Q , = 0 , ˆ C = CQ , , ˆ CA l = CA l Q , . (50)Next, by substituting the relations in (49) into the equationcorresponding to the second block-column equation of (45)and under the assumption that O R,s/ has full column rank,we can obtain that Q i, = 0 for i = 3 , · · · , R + 1; CQ , = 0;ˆ W i, − i = W i, − i Q , + W i, − i Q , for i = 1 , · · · , s/ − . (51)To illustrate this, by using the relations in (50), the secondblock-column equation of (47) can be equivalently written as  ˆ CA CQ , CA l Q ,  =  CA l CA CA r C C C C CCA l CA CA r CA l CA CA r   Q , Q , Q , Q , Q ,  . (52) We can observe that the bottom subpart of the vector on theleft-hand side equals the product of the ﬁrst block column of O , and Q , . By the assumption that O , (or O R,s/ ) hasfull column rank, we can derive that Q , = Q , = Q , = 0; CQ , = 0; Q , = Q , ;ˆ CA = CA l Q , + CAQ , . (53) By iteratively considering from the ﬁrst to the ( s − -thblock-column equation of (45) using the same strategy, wecan derive the relations in equation (48). Step 2.

Under the assumption that O R,s/ has full columnrank, the following relations can be derived from equation (45)  ˆ C ˆ C ˆ C ˆ C ˆ C ˆ CA l ˆ CA ˆ CA r ˆ CA l ˆ CA ˆ CA r ˆ CA l ˆ CA ˆ CA r  =  C C C C CCA l CA CA r CA l CA CA r CA l CA CA r   Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q , Q ,  (47) as well: i = 0 : ˆ W i,i = W i,i Q R +1 , R +1 ,W i,i Q R +1 ,j = 0 for j = 2 R + 3 − s, · · · , R ; i = 1 : ˆ W i,i = W i,i Q R +1 , R +1 , ˆ W i,i − = W i,i Q R +1 , R + W i,i − Q R +1 , R +1 , ˆ W i,i − = W i,i Q R +1 , R − + W i,i − Q R +1 , R + W i,i − Q R +1 , R +1 ,W i,i Q R +1 ,j + W i,i − Q R +1 ,j +1 + W i,i − Q R +1 ,j +2 = 0 for j = 2 R + 3 − s, · · · , R − · · · i = s/ − W i,i = W i,i Q R +1 , R +1 , · · · ˆ W i,i − s +2 = W i,i Q R +1 , R +3 − s + · · · + W i,i − s +2 Q R +1 , R +1 . (54) The relations in (54) are derived using the same strategy in

Step 1 by iteratively considering from the (2 R + 1) -th to the (2 R + 3 − s ) -th block-column equation of (45) in a reverseorder. Step 3.

We will show ˆ W s/ = W s/ Q , using the relationsin (48) and (54).Denote by ˆ O s − ,s/ (: , s − the ( s − -th block column ofthe estimate ˆ O s − ,s/ , which is a sub-matrix of ˆ O R +1 ,s/ .The relations in (48) can be compactly written as: ˆ O s − ,s/ (: , s −

1) = O s − ,s/  Q ,s − ...Q , ...  , (55)while those in (54) can be compactly written as: ˆ O s − ,s/ (: , s −

1) = O s − ,s/  ... Q R +1 , R +1 ...Q R +1 , R +3 − s  (56)To illustrate this using the speciﬁc example shown in (47), the equations (55)-(56) can be explicitly written as  C CA r ˆ CA ˆ CA l  = O ,  Q , Q , Q ,  ,  C CA r ˆ CA ˆ CA l  = O ,  Q , Q , Q ,  . (57) In view of equations (55) and (56), we can obtain that O s − ,s/  Q ,s − ...Q , Q , ...  = O s − ,s/  ... Q R +1 , R +1 Q R +1 , R ...Q R +1 , R +3 − s  By assumption 3) of the theorem, the matrix O s − ,s/ hasfull column rank. As a result, it can be established from theabove equation that Q , = Q R +1 , R +1 ; Q , = · · · = Q ,s − = 0; Q R +1 , R +3 − s = · · · = Q R +1 , R = 0 . (58)Since the block vector ˆ O s − ,s/ (: , s − contains all theblock components of ˆ W s/ , we can obtain that ˆ W s/ = W s/ Q , . Therefore, the proof is completed.R

EFERENCES[1] B. Bamieh, F. Paganini, and M. Dahleh, “Distributed control of spatiallyinvariant systems,”

Automatic Control, IEEE Transactions on , vol. 47,pp. 1091–1107, Jul 2002.[2] N. Motee and A. Jadbabaie, “Optimal control of spatially distributedsystems,”

Automatic Control, IEEE Transactions on , vol. 53, pp. 1616–1629, Aug 2008.[3] J. Rice and M. Verhaegen, “Distributed control in multiple dimensions:A structure preserving computational technique,”

Automatic Control,IEEE Transactions on , vol. 56, pp. 516–530, March 2011.[4] A. Mahajan, N. C. Martins, M. Rotkowitz, and S. Y¨uksel, “Informationstructures in optimal decentralized control.,” in

CDC , pp. 1291–1306,2012.[5] M. Wozny and W. Carpenter, “A characteristics approach to parameteridentiﬁcation in distributed systems,”

Automatic Control, IEEE Trans-actions on , vol. 14, pp. 422–422, Aug 1969.[6] D. Gorinevsky and M. Heaven, “Performance-optimized applied identiﬁ-cation of separable distributed-parameter processes,”

Automatic Control,IEEE Transactions on , vol. 46, pp. 1584–1589, Oct 2001. [7] J. Rice and M. Verhaegen, “Efﬁcient system identiﬁcation of hetero-geneous distributed systems via a structure exploiting extended kalmanﬁlter,” Automatic Control, IEEE Transactions on , vol. 56, pp. 1713–1718, July 2011.[8] T. Bewley and S. Liu, “Optimal and robust control and estimation oflinear paths to transition,”

Journal of Fluid Mechanics , vol. 365, pp. 305–349, Jun 25 1998.[9] F. Wu and S. Yildizoglu, “Distributed parameter-dependent modelingand control of ﬂexible structures,”

Journal of Dynamic Systems Mea-surement and Control-Transactions of The ASME , vol. 127, pp. 230–239,Jun 2005.[10] D. G. MacMynowski, R. Heimsten, and T. Andersen, “Distributed ForceControl of Deformable Mirrors,”

European Journal of Control , vol. 17,pp. 249–260, May-Jun 2011.[11] R. R. Pena, R. D. Fernandez, and R. J. Mantz, “Passivity control viaPower Shaping of a wind turbine in a dispersed network,”

InternationalJournal of Hydrogen Energy , vol. 39, pp. 8846–8851, May 2014.[12] A. Haber and M. Verhaegen, “Subspace identiﬁcation of large-scaleinterconnected systems,”

Automatic Control, IEEE Transactions on ,vol. 59, pp. 2754–2759, Oct 2014.[13] J. Kulkarni, R. DAndrea, and B. Brandl, “Application of distributedcontrol techniques to the adaptive secondary mirror of cornells large at-acama telescope,” in

SPIE Astronomical Telescopes and InstrumentationConf. , pp. 750–756, 2002.[14] P. Massioni and M. Verhaegen, “Subspace identiﬁcation of circulantsystems,”

Automatica , vol. 44, no. 11, pp. 2825 – 2833, 2008.[15] P. Massioni and M. Verhaegen, “Subspace identiﬁcation of distributed,decomposable systems,” in

Decision and Control, 2009 held jointlywith the 2009 28th Chinese Control Conference. CDC/CCC 2009.Proceedings of the 48th IEEE Conference on , pp. 3364–3369, Dec 2009.[16] N. Matni and A. Rantzer, “Low-rank and low-order decompositions forlocal system identiﬁcation,” arXiv:1403.7175 , 2014.[17] M. Verhaegen, “Subspace techniques in system identiﬁcation,”

Encyclo-pedia of Systems and Control , pp. 1386–1396, 2015.[18] G. Mercere, O. Prot, and J. Ramos, “Identiﬁcation of parameterizedgray-box state-space systems: From a black-box linear time-invariantrepresentation to a structured one,”

Automatic Control, IEEE Transac-tions on , vol. 59, pp. 2873–2885, Nov 2014.[19] L. Ljung,

System Identiﬁcation: Theory for the user (Second Edition) .Prentice Hall PTR, 1999.[20] M. Verhaegen and V. Verdult,

Filtering and System Identiﬁcation: ALeast Squares Approach . Cambridge University Press, 2007.[21] Z. Liu, A. Hansson, and L. Vandenberghe, “Nuclear norm systemidentiﬁcation with missing inputs and outputs,”

Systems and ControlLetters , vol. 62, no. 8, pp. 605 – 612, 2013.[22] M. Verhaegen and A. Hansson, “Nuclear norm subspace identiﬁcation(n2sid) for short data batches,”

CoRR abs/1401.4273 , 2014.[23] C. Yu and M. Verhaegen, “Subspace identiﬁcation of distributed clustersof homogeneous systems,”

IEEE Transactions on Automatic Control ,vol. 62, no. 1, pp. 463–468, 2017.[24] P. Dewilde and A.-J. Van der Veen,

Time-varying systems and compu-tations . Springer Science & Business Media, 2013.[25] K. Mohan and M. Fazel, “Reweighted nuclear norm minimization withapplication to system identiﬁcation,” in

American Control Conference(ACC), 2010 , pp. 2953–2959, IEEE, 2010.[26] L.-L. Xie and L. Ljung, “Estimate physical parameters by black boxmodeling,” in

Proceedings of the 21st Chinese Control Conference ,pp. 673–677, 2002.[27] C. Yu, M. Verhaegen, S. Kovalsky, and R. Basri, “Identiﬁcation ofstructured lti mimo state-space models,” in

Decision and Control (CDC),2015 IEEE 54th Annual Conference on , pp. 2737–2742, IEEE, 2015.[28] G. Strang, “Fast transforms: Banded matrices with banded inverses,”

Proceedings of the National Academy of Sciences of the United Statesof America , vol. 107, no. 28, pp. 12413 – 12416, 2009.[29] S. Choudhary and U. Mitra, “Identiﬁability scaling laws in bilinearinverse problems,” arXiv preprint arXiv:1402.2637arXiv preprint arXiv:1402.2637