Improved SAR Imaging Via Cross-Learning from Camera Images
Shahzad Gishkori, David Wright, Liam Daniel, Marina Gashinova, Bernard Mulgrew
11 Improved SAR ImagingVia Cross-Learning from Camera Images
Shahzad Gishkori, David Wright, Liam Daniel, Marina Gashinova and Bernard Mulgrew
Abstract —In this paper, we propose a novel concept of cross-learning, in order to improve SAR images by learning from thecamera images. We use a multi-level abstraction approach tomaterialise knowledge transfer between the two modalities. Wealso compare the performance of other possible approaches. Weprovide experimental results on real data to validate the proposedconcept.
Index terms—
SAR imaging, cross-learning, multi-modalfusion, manifold learningI. I
NTRODUCTION
Synthetic aperture radar (SAR) [1]–[3] can provide high-resolution images. Substantial amount of work is availableto enhance SAR image quality in terms of denoising and/orsuper-resolution (see e.g., [4], [5] and references therein).However, SAR image quality is still not at par with that ofoptical sensors, e.g, camera and lidar. Nonetheless, due to itsability to generate images, even in adverse weather conditions,SAR is emerging as a new imaging mode for automotivescenarios [6]–[8]. However, most of the previous work focuseson improving the SAR image by assuming SAR to be astand-alone sensor, without any interaction with any sensorof a different modality. In automotive (especially, autonomousdriving) scenarios, a car may be equipped with multiplesensors, e.g, radar, lidar, camera, etc [9], [10]. Therefore, itis natural to explore if a SAR image can be improved byusing images from other sensors of different modalities, i.e.,exploiting the framework of multi-modal fusion [11]–[13].This motivation also forms the basis for our present paper.Multi-modal fusion is a very generic concept, combiningdata/information from diverse modalities in order to enhancethe achievement of a common objective, e.g., creating aunified sensing system, improving decision making, identi-fying/extracting specific features, etc. The key property isdiversity, i.e., multiple modalities complementing each otherin achieving a common goal in a way that cannot be achievedwith a single modality [12]. Despite the unquestionable mo-tivation for multi-modal fusion, the real challenge is how toexactly exploit this diversity. The reasons are that different
S. Gishkori, D. Wright and B. Mulgrew are with Institute for Dig-ital Communications (IDCOM), The School of Engineering, The Uni-versity of Edinburgh, United Kingdom. Emails: { s.gishkori, d.wright,bernie.mulgrew } @ed.ac.ukL. Daniel and M. Gashinova are with Microwave Integrated SystemLaboratory (MISL), School of Electronic, Electrical and Systems Engi-neering, University of Birmingham, United Kingdom. Emails: { l.y.daniel,m.s.gashinova } @bham.ac.ukThis work was supported by Jaguar Land Rover and the UK-EPSRCgrants EP/N012240/1 & EP/N012372/1 as part of the jointly funded TowardsAutonomy: Smart and Connected Control (TASCC) Programme. modalities may be driven by different underlying variablesor they may operate on different physical principles, etc.Therefore, finding a direct correspondence/correlation betweenthem might not be straightforward. Some effort has beenexpended to devise a certain level of abstraction, e.g., [11],[14] (and references therein). However, more research needsto be done.SAR and lidar are active sensor modalities. Some work onthe fusion of hyperspectral SAR images and lidar images, inremote sensing domain, has appeared recently, e.g, [15], [16].A camera, in contrast, is a passive sensor modality and exploitsillumination from other sources. Its ranging estimates (e.g.,obtained by using the depth-maps for a stereo camera) are notas good as that of radar or lidar. However, it can provide verygood image resolution. Due to the very different dynamics ofSAR and camera sensors, e.g., operating principles, coordinatesystems, pixel resolution, etc, it is quite hard to register andfuse their respective images. Generally, in the available work,e.g., [10], radar is primarily used as a detection sensor insteadof an imaging sensor. Thus, according to our knowledge, notmuch work is available on the fusion of SAR and cameraimages.In this paper, we focus on fusing SAR and camera imagesat the data level without (strictly) registering the respectiveimages of the two sensors. To emulate automotive scenarios,we basically consider short-range radar sensing for extendedtargets (instead of point scatterers). Our primary aim is toreconstruct high-resolution SAR images, i.e, reconstructingthe physical details of the extended target. In order to dothis, we learn certain features of the target from cameraimages. Since these features have been learnt from a verydifferent modality, we name this process as cross-learning.Note, cross-learning may have some overlap with transfer-learning [17]. However, the emphasis in the former is ondifferent modalities. Cross-learning (from camera images) toimprove SAR imaging is a very new concept and it has thepotential to become a new area of research given the amountof challenges and opportunities associated with it. In thispaper, we present an approach to materialise this concept.Our basic premise is the observation that despite differencein resolution and viewing perspective, both modalities try tocapture the same physical geometry of the target. Therefore,correspondence or correlation between the two sensors doesexist in some latent- or intrinsic-dimensional representation oftheir respective images. This may potentially circumvent theneed for strict inter-sensor mapping/registration.Traditionally, manifold learning techniques (linear or non-linear) [18]–[20] have been used to retrieve low-dimensional a r X i v : . [ ee ss . SP ] A p r representation of a high-dimensional data for a wide rangeof tasks, e.g., detection, estimation, classification, visualisa-tion, fusion, etc [21]–[25]. However, most of these tasks are(best) carried out in the manifold domain without the aimof reconstructing the high-dimensional data. In our case, weneed to, i) create the respective manifolds of SAR and cameraimages to generate the intrinsic-dimensional or manifold-domain representation, ii) learn extra features from cameramanifold and transfer it to the SAR manifold, iii) reconstructthe SAR image in its high-dimensional or image-domain rep-resentation. Since, we do the learning in the manifold-domainand then reconstruct the original image-domain, we can onlyuse linear manifolds, e.g, principal component analysis (PCA).Non-linear manifolds, e.g., Laplacian eigenmaps (LE), locallylinear embedding (LLE), Hessian eigenmaps, diffusion mapsetc, provide efficient low-dimensional representation. How-ever, they cannot be transformed/projected back to the imagedomain. Manifold alignment [24], [26], [27] has been aneffective way of transferring knowledge/information betweendifferent datasets. Similarly, in the case of super-resolution offace images, building on a two step approach of global andthen local features adjustment [28], [29], some authors, e.g.,[30], [31] have advocated the creation of a coherent subspaceover the manifolds for efficient transfer of knowledge. Due toan extra layer of abstraction, the latter approach has the abilityto work with a modest amount of training samples as well asto compensate for choosing a linear manifold instead of anon-linear manifold (if required). Now, in case of transferringinformation from a camera image to a radar image, i.e., cross-learning, there are multiple challenges, e.g., the modalitiesare different, coordinate systems are different therefore thetwo sensors cannot be fully registered with each other, thereis substantial disparity in resolution, choice of manifolds islimited due to reconstruction requirement, etc. Thus, in orderto circumvent these challenges, a multi-stage abstraction maybe the right course of action. To this end, we follow theapproach of [30]. We create PCA-based manifolds for both thesensors and generate a coherent subspace by using the canon-ical correlation analysis (CCA) [32]. Then, we use LLE [21]to learn/adjust the neighbourhood embedding of the coherentsubspace from the camera to the radar, followed by recoveringthe improved SAR image. Note, the input SAR images aregenerated by our recently proposed forward-scanning SAR(FS-SAR) [6] mode for the automotive scenarios, albeit, thesynthetic aperture considered here is circular instead of linear,i.e., a circular-scanning SAR (CiS-SAR). Contributions . The following are the main contributions ofthis paper. • We present a novel concept of improving SAR imagesvia cross-learning from camera images. • We show that multiple levels of abstraction can helpcircumvent the challenges of knowledge transfer in thesedifferent modalities. • We consider a CiS-SAR mode generated SAR images asinput to the cross-learning frame-work. • We present qualitative performance results based on real-data obtained in our lab controlled experimental setup. θ min θ max ll − l − Circular Synthetic ApertureExtended Target ll − l − Extended TargetRight LensLeft Lens Camera Trajectory
SAR Camera
Fig. 1: SAR and Camera System Schematic
Organisation . Section II provides the system model, SectionIII elaborates on the realisation of cross-learning via a multi-level abstraction approach, Section IV provides experimentalresults and performance comparisons, and Section V gives theconclusions.
Notations . Matrices are in upper case bold while columnvectors are in lower case bold, ( · ) T denotes transpose, [ a ] i is the i th element of a and [ A ] i,j is the ij th element of A , ˆ a is the estimate of a , ∆ = defines an entity, and the (cid:96) p -norm isdenoted as || a || p = ( (cid:80) Ni =1 | [ a ] i | p ) /p .II. S YSTEM M ODEL
In [6], we proposed an FS-SAR mode to enhance the az-imuth resolution of an automotive radar. This mode combinesforward-scanning with SAR processing. FS-SAR assumes alinear aperture. In the present paper, we consider a circularaperture, i.e., a circular-scanning SAR (CiS-SAR). CiS-SARcombines the benefits of scanning and spotlight SAR, resultingin enhanced azimuth resolution. We opt for CiS-SAR as ageneric SAR mode in order to exhibit the cross-learningpossibilities from camera to SAR. However, future worksmay consider other SAR modes as well. Figure 1 shows theschematic for CiS-SAR. At each scan-step, l ∈ [1 , L ] , over thecircular aperture, the radar scans the extended target over theangular range (field of view of the sensor), θ ∈ [ θ min , θ max ] .Thus, the target information is obtained both over the circularaperture as well as over the angular scan per aperture position.Similar to compressed sensing-based back-projection (CBP) in[6], we first process the measurements received over the scansby using a compressed sensing-based algorithm to improve theresolution and then back-project the reconstructed images fromall the scans over the circular aperture to generate a coherentimage of the extended target.Let, the radar transmits frequency modulated continuouswave (FMCW) pulses towards the target. The signal receivedis then deramped, low-pass filtered, deskewed, and Fouriertransformed along the fast-time to obtain the range profile(see [6] for explicit expressions and subsequent details on theradar signal model). However, the received signal along theazimuth, for a scanning radar, at scan-step l and range r canbe considered as a convolution of the radar antenna beam, h ( θ ) , and the azimuth reflectivity function, x l,r ( θ ) , i.e., y l,r ( θ ) = h ( θ ) (cid:63) x l,r ( θ ) + ν l,r ( θ ) (1) where (cid:63) denotes convolution and ν l,r ( θ ) represents themodel/thermal noise. Collecting all of the azimuth samplesover θ , (1) can be written as y l,r = GHx l,r + ν l,r (2)where H is a block-Toeplitz convolution matrix and G isa selection matrix to balance the numeric relation between N x × vector x l,r and N θ × vector y l,r . Note, in thecontext of azimuth-resolution enhancement, N x (cid:29) N θ . Now,concatenating y l,r over all N r range bins, we can write (2) as y l = [ I N r ⊗ ( GH )] (cid:124) (cid:123)(cid:122) (cid:125) ∆ = A x l + ν l (3)where y l ∆ = [ y Tl, , y Tl, , · · · , y Tl,N r ] T is an N θ N r × vector.Similarly, x l and ν l can be defined as N x N r × and N θ N r × vectors, respectively. Further, A is the N θ N r × N x N r mea-surement matrix. Now, according to CBP, x l can be estimatedby solving the following (fused LASSO [33]) optimisationproblem. ˆ x l = arg min x l (cid:107) y l − Ax l (cid:107) + λ e (cid:107) x l (cid:107) + λ f (cid:107) Dx l (cid:107) (4)where λ e and λ f are positive penalty parameters controllingelement-wise sparsity and fusion in x l , respectively, and D is the fusion matrix (i.e., Dx l is a vector of differences ofconsecutive elements of x l ) [33]. Then, the reconstructed radarimage via back-projection, at pixel ( i, j ) , can mathematicallybe represented as γ i,j = L − (cid:88) l =0 [ ↑ ,κ (cid:48) ( ˆ X l )] I θi,j ,I ri,j (5)where ↑ κ,κ (cid:48) ( · ) interpolates/upsamples a matrix by an order κ and κ (cid:48) along its rows and columns, respectively, ˆ X l is the N x × N r reshaped matrix form of ˆ x l , [ · ] I θi,j ,I ri,j representsthe row and column indices of the matrix corresponding toangle θ i,j and range r i,j for the ( i, j ) th pixel, respectively.Since each scanning position over the aperture may contributeto each pixel in the reconstructed image, we can rewrite (5)as γ i,j = L − (cid:88) l =0 γ li,j (6)where γ li,j ∆ = [ ↑ ,κ (cid:48) ( ˆ X l )] I θi,j ,I ri,j . Let, all of the imagepixels γ li,j , for i = 1 , · · · , √ N and j = 1 , · · · , √ N , w.r.t.contributions from the l th aperture position, are collected inan N × vector r l , i.e., r l ∆ = (cid:104) γ l , , · · · , γ l √ N, , γ l √ N, , · · · , γ l √ N, √ N (cid:105) T . (7)Now, we can collect all of the radar images generated fromeach aperture position, as defined in (7), as an N × L matrix R , i.e., R ∆ = [ r , r , · · · , r L ] . (8)For the proposed cross-learning, we assume that the cameratrajectory is the same as that of the radar synthetic aperture,i.e., the camera images of the target are also obtained fromthe same physical location as that of the radar. However, the camera does not involve any scanning and takes onesnapshot for each location over its trajectory. Figure 1 showsthe schematic of camera image acquisition. For the sake ofclarity, the schematic for camera has been drawn separatelyfrom SAR. However, in practice, both sensors may share thephysical location. The camera can be mono or stereo, withdifferent image formats, i.e, RGB, greyscale, depth-map etc.It may also have its own requirements w.r.t. configuration,calibration, disparity/point-cloud formulation, etc. We assumethat these pre-requisites have already been met. However, wedo not assume any strict registration between the camera andthe radar, as it is very difficult and our approach essentiallytries to circumvents its need.Let, a √ M × √ M generic camera image of the target at l th position on its trajectory is represented as an M × vector s l via lexicographic ordering (column ordered). Then, we cancollect all such images for the complete trajectory into an M × L matrix S as S ∆ = [ s , s , · · · , s L ] . (9)The rest of the paper essentially deals with (8) and (9) in termsof proposing a cross-learning strategy.III. C ROSS -L EARNING
In order to improve SAR images via cross-learning fromcamera images, we adopt a multi-level abstraction approach.From the test images, we first learn the respective mani-folds. Secondly, we use CCA to generate a coherent sub-space between the two modalities. Thirdly, we use LLE forneighbourhood embedding w.r.t. test images of the radar andcamera. Finally, the processed radar image is projected fromthe manifold domain back to the image domain. We namethis approach as, multi-level CCA-based (ML-CCA) cross-learning.
A. Suitable Manifold
As explained earlier, after cross-learning, we need to re-construct the SAR image from low-dimensional space of themanifold domain to the high-dimensional space of the imagedomain. Therefore, non-linear manifolds cannot be used. Interms of linear manifolds, we opt for the classical PCA basedmanifolds.PCA represents data by using the directions of maximumvariance. Thus, it requires computing the principal eigenvec-tors of the data covariance matrix. Now, assuming that thedatasets in (8) and (9) are centred (i.e., the correspondingsample means have been subtracted from them), the covariancematrices of radar and camera datasets can be defined as, C r ∆ = (1 /L ) RR T and C s ∆ = (1 /L ) SS T , respectively. Theeigenvalue decomposition (EVD) of the covariance matricescan then be carried out as EVD( C r ) = U r Σ r V Tr (10) EVD( C s ) = U s Σ s V Ts (11)where matrices U r and U s contain the left eigenvectors, matri-ces V r and V s contain the right eigenvectors, and matrices Σ r and Σ s contain the corresponding eigenvalues along their diag-onals, for radar and camera, respectively. The low-dimensionaldata representation essentially corresponds to projecting thedata on a few significant eigenvectors. Let, n and m representthe number of significant eigenvectors (or subsequent principalcomponents) for SAR and camera manifolds, respectively.Then, ¯ U r ∆ = [ U r ] : , n and ¯ U s ∆ = [ U s ] : , m are the N × n and M × m corresponding PCA-based projection matrices.The PCA-based projection coefficients can be obtained as P r = ¯ U Tr R = [ p r , p r , · · · , p r L ] (12) P s = ¯ U Ts S = [ p s , p s , · · · , p s L ] (13)where p r l ∆ = ¯ U Tr r l , p s l ∆ = ¯ U Ts s l , and, P r and P s are n × L and m × L PCA coefficient matrices of SAR and cameramanifolds, respectively.
B. Coherent Subspace
CCA finds a low-dimensional coherent subspace betweentwo datasets. In this paper, we consider a one-dimensionalCCA subspace. Thus, in our case, CCA provides one basisvector for each dataset such that the correlation between thecorresponding projection coefficients is maximised. Note, thedatasets, in our case, correspond to PCA-based manifold co-efficients, i.e., (12) and (13). Mathematically, we can estimatethe CCA-based subspace by solving the following optimisationproblem, as in [32]. (cid:104) ˆ b r , ˆ b s (cid:105) = arg max b r , b s b Tr Q rs b s (14a)s.t. b Tr Q r b r = 1 , b Ts Q s b s = 1 (14b)where b r and b s are n × and m × canonical basisvectors for the radar- and camera-manifold datasets, respec-tively, and, Q r ∆ = (1 /L ) P r P Tr , Q s ∆ = (1 /L ) P s P Ts and Q rs ∆ = (1 /L ) P r P Ts are the corresponding covariance ma-trices. Note, the constraints (14b) are imposed to ensure aunique solution. Now, solving (14) essentially boils down tosolving the following generalised eigenvalue problem (see [32]for details). (cid:20) Q Trs
00 Q rs (cid:21) (cid:20) b r b s (cid:21) = 2 λ (cid:20) s Q r (cid:21) (cid:20) b r b s (cid:21) (15)where λ is the generalised eigenvalue. Solving (14) is equiv-alent to finding the largest generalised eigenvalue in (15),i.e., λ = λ max , and the corresponding generalised eigenvectorprovides the estimate of canonical basis vectors as, [ˆ b Tr , ˆ b Ts ] T .From the basis vectors, the corresponding CCA-based coeffi-cients can be obtained as a r = P Tr ˆ b r (16) a s = P Ts ˆ b s (17)where a r and a s are L × vectors of CCA-based coefficientsw.r.t. the radar and camera manifolds, respectively. C. Neighbourhood Embedding
LLE is used to compute low-dimensional neighbourhood-preserving embeddings of high-dimensional data. It is basedon a simple geometric intuition. Given a data point and itsneighbours, in high-dimension, lie on a locally linear patchof the manifold, the data point can be reconstructed by linearcombination of its neighbours. Then, the data point can bemapped to a low-dimensional representation while preservingits neighbourhood characterisation (see [21] for more details).In the context of cross-learning, i) the mapping is done fromthe camera manifold to the radar manifold, ii) the radar and thecamera manifolds have been substituted with respective CCA-based coefficients which are linear due to one-dimensionalCCA subspace and iii) in terms of CCA-based coefficients,the data dimension for both radar and camera is the same,therefore, we do not need to find the low-dimensional values.Thus, LLE can be easily applied to our case for neighbourhoodembedding, i.e., we need to find the linear coefficients whichreconstruct a camera data point from its neighbours and thenuse the same linear coefficients to reconstruct a radar datapoint from its neighbours. This constitutes neighbourhoodembedding in the context of cross-learning.Let, r t and s t be the test SAR and camera images, re-spectively, with p r t ∆ = ¯ U Tr r t and p s t ∆ = ¯ U Ts s t as thecorresponding data points on the manifolds. Then, the CCA-based coefficients for the test images can be obtained as a r t = p Tr t ˆ b r (18) a s t = p Ts t ˆ b s (19)where a r t and a s t are scalar values. Let, N Kr t and N Ks t represent the sets of K nearest neighbours (K-NN) of a r t and a s t , respectively. Now, we can write the optimisation problemof finding the linear coefficients of reconstructing a s t from itsneighbours in a s as a constrained least-squares fitting problem,i.e., ˆ w = arg min w (cid:107) a s t − w T ¯ a s (cid:107) (20a)s.t. (cid:107) w (cid:107) = 1 (20b)where ¯ a s is K × sub-vector of a s , such that, [ a s ] i ∈ N Ks t , for i = 1 , · · · , K . Note, (20) essentially applies two constraints, i) a sparseness constraint, i.e., weights are non-zero only forthe K-NN of a s t , ii) an invariance constraint, i.e., the sum oflinear coefficients equals one, as (20b). An efficient way tominimise the error in (20a) is to solve the following systemof linear equations Gw = (21)where G ∆ = ( a s t − ¯ a s )( a s t − ¯ a s ) T , and then rescale thecoefficients to satisfy (20b) (more details in [21]). Now, thelearnt coefficients can be used to reconstruct the radar datapoint from the neighbours, i.e., ˆ a r t = ˆ w T ¯ a r (22)where ¯ a r is K × sub-vector of a r , such that, [ a r ] i ∈ N Kr t ,for i = 1 , · · · , K . (a) ZED and Radar (300 GHz) (b) Trolley on a Turn-Table Fig. 2: Experimental SetupTABLE I: Specifications of
GHz Radar
Modulation FMCWFrequency Range − GHzTransmit Bandwidh ( B ) GHzChirp Duration ( T ) msSampling Frequency . MHzAngular Step ( ∆ θ ) . ◦ Range Resolution ( ∆ r ) . mTwo-way dB Beamwidth ( θ ) . ◦ D. Image Reconstruction
After learning the CCA-based coefficient, the learnt radarimage in the manifold domain can be obtained as ˜ p r t = (ˆ b Tr ) † ˆ a r t + p r t (23)where ( · ) † denotes the Moore-Penrose (or pseudo) inverse.Now, the radar image can be projected from the manifolddomain back to the image domain as ˜ r t = ¯ U r ˜ p r t (24)where ˜ r t is the improved SAR image obtained via cross-learning from the camera image.IV. E XPERIMENTAL R ESULTS
In this section, we validate the proposed concept of cross-learning between SAR and camera images by experimentalresults. According to our knowledge, a public dataset forconcurrent SAR and camera measurements is not available.Therefore, as part of this research, we have carried outsuch measurements in a laboratory controlled environment.Our experimental setup (see Figure 2) mainly consists of aZED stereo camera (see [34] for specifications), a
GHzFMCW radar (see Table I for specifications) and a trolley(of size × . × . (length × width × height) m ) ona turntable. Figure 3 shows the measurement schematic of theexperiment. The trolley is placed on a turntable at a distanceof . m from the joint sensors (ZED and radar) platform.Measurements from the sensors are taken for every degreeangular turn (counter clock-wise) of the trolley. This emulatesthe circular motion of the sensors around the target. Thus, Reference Corner Re fl ectorZEDTrolleyTurntableHandleTrolleyTurnDirection 3.65m7mUnderneathRadar θ min θ max Fig. 3: Measurement Schematic (a) l = 1 (0 ◦ ) (b) l = 19 (90 ◦ ) (c) l = 37 (180 ◦ ) (d) l = 55 (270 ◦ ) Fig. 4: ZED Greyscale Images L = 72 (synthetic) aperture samples are obtained all aroundthe trolley. Note, in this paper, we consider a full ◦ circularaperture, for the purpose of illustration only. However, inpractice, the target can be seen by a partial aperture resultingin corresponding gains via cross-learning. For every aperturesample, we consider only left lens RGB image from the ZED.We convert the RGB image to a greyscale image. Figure 4shows the ZED images of the trolley at different positions(on the truntable) l = 1 (0 ◦ ) ,
19 (90 ◦ ) ,
37 (180 ◦ ) ,
55 (270 ◦ ) .At every aperture position, the radar scans the target scene foran angular range θ = ± ◦ , at angular intervals ∆ θ = 0 . ◦ .Then, using (4) and (7), a SAR image is created for every l thaperture position. Finally, a combined image of the target isachieved by using (5). Figure 5 shows the SAR images for l = 1 (0 ◦ ) ,
19 (90 ◦ ) ,
37 (180 ◦ ) ,
55 (270 ◦ ) and the combinedCiS-SAR image of the trolley. Note, all SAR images havebeen normalised so that the maximum intensity is unity. Wecan see that the SAR images of individual apertures, Figures5a–5d, capture viewing-angle dependent information of the (a) l = 1 (0 ◦ ) (b) l = 19 (90 ◦ ) (c) l = 37 (180 ◦ ) (d) l = 55 (270 ◦ ) (e) CiS-SAR Fig. 5: SAR Images -6000 -4000 -2000 0 2000 4000 6000 8000 10000-8000-6000-4000-2000020004000600080001000012000 (a) Camera Manifold -8000 -6000 -4000 -2000 0 2000 4000 6000 8000-3000-2000-10000100020003000 (b) SAR Manifold
Fig. 6: PCA-based Manifoldstarget. Nonetheless, the combined image, Figure 5e, providesvery good imaging result in capturing the complete outline ofthe target, which re-affirms the enhanced performance of CBPreconstruction algorithm as proposed in [6]. However, we cansee that the handle of the trolley is not very prominent.Now, in order to improve SAR imaging results we usethe cross-learning concept, employing ML-CCA approach asexplained in Section III. Using (12) and (13), for n = m = 15 principal components, we obtain the PCA based manifolds ofthe images of the two sensors. Note, we chose these number ofprincipal components as they seemed to provide good resultsfrom a qualitative perspective. Figure 6 shows these manifoldsfor first two principal components. We can see that the ZEDmanifold is more elaborate than the SAR manifold (which (a) ML-CCA (b) ML-CCA+(c) MFA (d) MFA+(e) ML-MFA (f) ML-MFA+ Fig. 7: Cross-Learningis quite concentrated). This shows that the ZED images aremore distinguishable than the SAR images. Thus, the SARimages have a big margin of learning from the camera images.Figure 7a shows the SAR image using ML-CCA approach.Note, we essentially use all of the training images as the testimages, i.e., t = 1 , · · · , L , for both the sensors. We can seethat in comparison to CiS-SAR (Figure 5e), ML-CCA imagehas captured some new information of the target. The walls ofthe trolley are more prominent. However, the most interestingaspect is the visibility of the trolley handle. Nonetheless, wecan see that the target information both in CiS-SAR and ML-CCA does not seem to overlap for every pixel. Thus, a naturalcourse of action is to combine the two images. We namethe combined CiS-SAR and ML-CCA image as ML-CCA+.Figure 7b shows the ML-CCA+ image. We can see that it isa much improved image than the CiS-SAR as in Figure 5e.In order to compare the performance of ML-CCA or ML-CCA+, we also provide imaging results with another possiblecross-learning approach. In this approach, instead of two levelsof abstraction , as in ML-CCA, we use a single level ofabstraction. It is the manifold alignment (MFA) approach. Inthis approach the PCA-based manifolds of the two sensors areessentially aligned using Procrustes analysis as in [27]. Thebasic idea is that given pairwise correspondence between thetwo datasets (assumed centred), a mapping is obtained to align Note, here we do not count the manifold creation as a level of abstraction. the test data points. Using the earlier terminology developed inthis paper, the following singular value decomposition (SVD)is performed as a first step.
SVD( P r P Ts ) = UΣV T (25)assuming n = m . Then, the improved image via MFA can beobtained as ˜ r t = k r t Q (26)where k ∆ = trace( Σ ) / trace( P r P Tr ) and Q ∆ = UV T . Figure 7cshows the imaging result of using MFA approach. Similarto ML-CCA+, we also provide the image result of MFA+in Figure 7d. We can see that MFA does get some extrainformation in comparison to CiS-SAR (Figure 5e). However,its performance is inferior to ML-CCA (Figure 7a). Similarly,ML-CCA+ (Figure 7b) provides better result than MFA+.We also compare the performance of ML-CCA or ML-CCA+ with another possible multi-level cross-learning ap-proach. In this approach, we extend the MFA by neighbour-hood embedding via LLE (similar to Section III-C). We namethis approach as multi-level MFA (ML-MFA). Figure 7e showsthe imaging result of ML-MFA. Similar to ML-CCA+, wealso provide the image result of ML-MFA+ in Figure 7f. Wecan see that ML-MFA is showing some features of the trolleyhandle. Therefore, ML-MFA+ shows an improved image.It is better than MFA+ (Figure 7d). However, ML-CCA+(Figure 7b) still outperforms ML-MFA+. Nonetheless, wecan say that multi-level abstraction approaches have superiorperformance in realising the concept of cross-learning toimprove SAR images by using the camera images.V. C ONCLUSIONS
In this paper, we have proposed a novel concept of cross-learning, in order to improve SAR images by learning from thecamera images. Despite the fact that the two sensors are verydifferent modalities, we have used a multi-level abstractionapproach to achieve knowledge transfer between them. Wehave shown that a realisation of multi-level abstraction in theform of creation of a coherent subspace over SAR and cameramanifolds, followed by neighbourhood embedding, providesvery good results. We have also proposed other possible ap-proaches to materialise the concept of cross-learning, namely,a manifold alignment approach and a multi-level manifoldalignment (which includes neighbourhood embedding) ap-proach. Overall, we have observed that multi-level abstractionapproaches provide better performance results. In order tovalidate the proposed concept, we have provided experimentalresults on real data obtained in controlled lab environment.Cross-learning is an ongoing research area and this paperhighlights some early achievements in this field.R
EFERENCES[1] W. Carrara, R. Goodman, and R. Majewski,
Spotlight Synthetic ApertureRadar . Boston: Artech House, 1995.[2] C. Jakowatz, D. Wahl, P. Eichel, D. Ghiglia, and P. Thompson,
Spotlight-Mode Synthetic Aperture Radar: A Signal Processing Approach . MA,USA: Kulwer Academic Publishers, 1996.[3] M. Soumekh,
Synthetic Aperture Radar Signal Processing with MATLABAlgorithms . NY, USA: John Wiley & Sons, Inc., 1999. [4] M. Cetin, I. Stojanovic, O. Onhon, K. Varshney, S. Samadi, W. C. Karl,and A. S. Willsky, “Sparsity-driven synthetic aperture radar imaging:Reconstruction, autofocusing, moving targets, and compressed sensing,”
IEEE Signal Processing Magazine , vol. 31, no. 4, pp. 27–40, July 2014.[5] S. Gishkori and B. Mulgrew, “Graph signal processing-based imagingfor synthetic aperture radar,”
IEEE Geoscience and Remote SensingLetters , p. to appear, 2019.[6] S. Gishkori, L. Daniel, M. Gashinova, and B. Mulgrew, “Imagingfor a forward scanning automotive synthetic aperture radar,”
IEEETransactions on Aerospace and Electronic Systems , vol. 55, no. 3, pp.1420–1434, June 2019.[7] S. Gishkori, D. Wright, L. Daniel, M. Gashinova, and B. Mulgrew,“Imaging moving targets for a forward scanning automotive SAR,”
IEEETransactions on Aerospace and Electronic Systems , p. to appear, 2019.[8] I. Bilik, O. Longman, S. Villeval, and J. Tabrikian, “The rise of radarfor autonomous vehicles: Signal processing solutions and future researchdirections,”
IEEE Signal Processing Magazine , vol. 36, no. 5, pp. 20–31,Sep. 2019.[9] E. Guizzo, “How Google’s self-driving car works,” Oct. 2011. [On-line]. Available: https://spectrum.ieee.org/automaton/robotics/artificial-intelligence/how-google-self-driving-car-works[10] H. Cho, Y. Seo, B. V. K. V. Kumar, and R. R. Rajkumar, “A multi-sensorfusion system for moving object detection and tracking in urban drivingenvironments,” in , May 2014, pp. 1836–1843.[11] B. Khaleghi, A. Khamis, F. O. Karray, and S. N. Razavi, “Multisensordata fusion: A review of the state-of-the-art,”
Inf. Fusion , vol. 14, no. 1,pp. 28–44, Jan. 2013.[12] D. Lahat, T. Adali, and C. Jutten, “Multimodal data fusion: An overviewof methods, challenges, and prospects,”
Proceedings of the IEEE , vol.103, no. 9, pp. 1449–1477, Sep. 2015.[13] D. Ramachandram and G. W. Taylor, “Deep multimodal learning:A survey on recent advances and trends,”
IEEE Signal ProcessingMagazine , vol. 34, no. 6, pp. 96–108, Nov 2017.[14] L. Sorber, M. Van Barel, and L. De Lathauwer, “Structured data fusion,”
IEEE Journal of Selected Topics in Signal Processing , vol. 9, no. 4, pp.586–600, June 2015.[15] C. Debes, A. Merentitis, R. Heremans, J. Hahn, N. Frangiadakis, T. vanKasteren, W. Liao, R. Bellens, A. Piurica, S. Gautama, W. Philips,S. Prasad, Q. Du, and F. Pacifici, “Hyperspectral and LiDAR data fusion:Outcome of the 2013 GRSS data fusion contest,”
IEEE Journal ofSelected Topics in Applied Earth Observations and Remote Sensing ,vol. 7, no. 6, pp. 2405–2418, June 2014.[16] M. Dalla Mura, S. Prasad, F. Pacifici, P. Gamba, J. Chanussot, and J. A.Benediktsson, “Challenges and opportunities of multimodality and datafusion in remote sensing,”
Proceedings of the IEEE , vol. 103, no. 9, pp.1585–1601, Sep. 2015.[17] S. J. Pan and Q. Yang, “A survey on transfer learning,”
IEEE Transac-tions on Knowledge and Data Engineering , vol. 22, no. 10, pp. 1345–1359, Oct 2010.[18] C. M. Bishop,
Pattern Recognition and Machine Learning (InformationScience and Statistics) . Berlin, Heidelberg: Springer-Verlag, 2006.[19] X. Huo and A. Smith, “A survey of manifold-based learning methods,”
Recent Advances in Data Mining of Enterprise Data , pp. 691–745, 2008.[20] L. van der Maaten, E. Postma, and J. van den Herik, “Dimensionalityreduction: A comparative review,” Tilburg University, Tech. Rep., Oct.2009.[21] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction bylocally linear embedding,”
Science , vol. 290, pp. 2323–2326, 2000.[22] M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionalityreduction and data representation,”
Neural Comput. , vol. 15, no. 6, pp.1373–1396, Jun. 2003.[23] M. B. Wakin, D. L. Donoho, H. Choi, and R. G. Baraniuk, “Themultiscale structure of non-differentiable image manifolds,” in
SPIEConference Series , 2005, pp. 413–429.[24] S. Lafon, Y. Keller, and R. R. Coifman, “Data fusion and multicue datamatching by diffusion maps,”
IEEE Trans. Pattern Anal. Mach. Intell. ,vol. 28, no. 11, pp. 1784–1797, Nov. 2006.[25] M. A. Davenport, C. Hegde, M. F. Duarte, and R. G. Baraniuk, “Jointmanifolds for data fusion,”
IEEE Transactions on Image Processing ,vol. 19, no. 10, pp. 2580–2594, Oct 2010.[26] J. Ham, D. Lee, and S. L, “Semisupervised alignment of manifolds,”
Proceedings of the Annual Conference on Uncertainty in ArtificialIntelligence , vol. 10, pp. 120–127, Jan 2005.[27] C. Wang and S. Mahadevan, “Manifold alignment using procrustesanalysis,” in
Proceedings of the 25th International Conference onMachine Learning , 2008, pp. 1120–1127. [28] C. Liu, H.-Y. Shum, and W. T. Freeman, “Face hallucination: Theoryand practice,”
International Journal of Computer Vision , vol. 75, no. 1,pp. 115–134, Oct 2007.[29] Hong Chang, Dit-Yan Yeung, and Yimin Xiong, “Super-resolutionthrough neighbor embedding,” in
Proceedings of the 2004 IEEE Com-puter Society Conference on Computer Vision and Pattern Recognition,2004. CVPR 2004. , vol. 1, June 2004, pp. I–I.[30] H. Huang, H. He, X. Fan, and J. Zhang, “Super-resolution of human faceimage using canonical correlation analysis,”
Pattern Recogn. , vol. 43,no. 7, pp. 2532–2543, Jul. 2010.[31] H. Huang and H. He, “Super-resolution method for face recognitionusing nonlinear mappings on coherent features,”
IEEE Transactions onNeural Networks , vol. 22, no. 1, pp. 121–130, Jan 2011.[32] D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical correlationanalysis: An overview with application to learning methods,”
NeuralComputation , vol. 16, no. 12, pp. 2639–2664, 2004.[33] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, “Sparsityand smoothness via the fused LASSO,”