Structured Group Local Sparse Tracker
SSTRUCTURED GROUP LOCAL SPARSE TRACKER
Mohammadreza Javanmardi, Xiaojun Qi
Department of Computer Science, Utah State University, Logan, UT 84322-4205, USA
ABSTRACT
Sparse representation is considered as a viable solution to vi-sual tracking. In this paper, we propose a structured grouplocal sparse tracker (SGLST), which exploits local patchesinside target candidates in the particle filter framework. Un-like the conventional local sparse trackers, the proposed op-timization model in SGLST not only adopts local and spa-tial information of the target candidates but also attains thespatial layout structure among them by employing a group-sparsity regularization term. To solve the optimization model,we propose an efficient numerical algorithm consisting of twosubproblems with the closed-form solutions. Both qualitativeand quantitative evaluations on the benchmarks of challeng-ing image sequences demonstrate the superior performance ofthe proposed tracker against several state-of-the-art trackers.
1. INTRODUCTION
Visual tracking is the process of estimating states of a movingobject in a dynamic frame sequence. It has been consideredas one of the most paramount and challenging topics in com-puter vision with various applications in human motion anal-ysis, surveillance, smart vehicles transportation, navigation,etc. Although numerous tracking methods [1, 2, 3, 4, 5, 6]have been introduced in recent years, developing a robust al-gorithm that can handle different challenges such as occlu-sion, illumination variations, deformation, fast motion, cam-era motion, and background clutter still remains unsolved.Visual tracking algorithms can be roughly classified intodiscriminative and generative categories. Discriminative ap-proaches cast the tracking problem as binary classificationand formulate a decision boundary to separate the target frombackgrounds. Representative discriminative approaches in-clude the ensemble tracker [7], the online boosting [8, 9], themultiple instance learning [10], the PN-learning [11], and cor-relation filter-based trackers [12, 13, 14]. In contrast, genera-tive approaches adopt a model to represent the target and castthe tracking as a searching procedure to find the most similarregion to the target model. Representative generative trackingmethods include eigen-tacking [15], mean-shift [16], Frag-Track [17], incremental learning [18], visual tracking decom-position [19], and adaptive color tracking [20].Sparse representation based trackers (sparse trackers) areconsidered as generative tracking methods since they sparsely express the target candidates using a few templates (bases).Generally, sparse representation has played a dominant rolein computer vision applications such as face recognition [21],image denoising and restoration [22], image segmentation[23, 24], image pansharpening [25], etc. Most sparse trackersutilize a convex optimization model to represent the globalappearance of target candidates in the particle filter frame-work. As one of the pioneer work, Mei et al. [26] representthe global information of target candidates by a set of tem-plates using (cid:96) minimization. Bao et al. [27] present an ac-celerated proximal gradient descent method to increase theefficiency of solving (cid:96) minimization. In order to attain the re-lationship among target candidates, Zhang et al. [28] proposeto jointly learn the global information of all target candidates.Later, Hong et al. [29] cast tracking as a multi-task multi-viewsparse learning problem in terms of the least square (LS). Tohandle the data possibly contaminated by outliers and noise,Mei et al. [30] use the least absolute deviation (LAD) in theiroptimization model. In general, these global sparse trackersachieve good performance. However, they model each targetregion as a single entity and may fail when targets undergoheavy occlusions in a frame sequence.Unlike global sparse trackers, local sparse trackers repre-sent local patches inside target candidates together with localpatches inside each template set. Liu et al. [31] introduce alocal sparse tracker, which adopts the histogram of sparse co-efficients and a sparse constrained regularized mean-shift al-gorithm, to robustly track the object. This method is based ona static local sparse dictionary and therefore fails in the caseswhen similar objects appear in the scene. Jia et al. [32] ex-ploit both partial and spatial information of target candidatesand represent them in a dynamic local sparse dictionary. Morerecently, Jia et al. [33] propose to extract coarse and fine lo-cal image patches inside each target candidate. Despite favor-able performance, these local sparse trackers [32, 33] do notconsider the spatial layout structure among local patches in-side a target candidate. As a result, the sparse vectors of localpatches exhibit a random pattern rather than a similar struc-ture on the non-zero elements.To further improve the tracking performance, recentsparse trackers consider both global and local information ofall target candidates in their optimization models. Zhang et al.[5] represent local patches inside all target candidates alongwith the global information using (cid:96) , norm regularization onThis paper is a preprint of a paper submitted to IET Image Processing. If accepted, the copy of record will be available at theIET Digital Library a r X i v : . [ c s . C V ] M a r he sparse representative matrix. They assume that the samelocal patches of all target candidates are similar. However, thisassumption does not hold in practice due to outlier candidatesand occlusion in tracking. To address this shortcoming, Zhanget al. [34] take into account both factors to design an optimaltarget region searching method. These recent sparse trackersachieve improved performance. However, considering the re-lationship of all target candidates degrades the performancewhen drifting occurs. In addition, using (cid:96) , norm regular-ization in the optimization model to integrate both local andglobal information of target candidates lessens the trackingaccuracy in the cases of heavy occlusions.In this paper, we propose a structured group local sparsetracker (SGLST), which exploits local patches inside a targetcandidate and represent them in a novel convex optimizationmodel. The proposed optimization model not only adopts lo-cal and spatial information of the target candidates but alsoattains the spatial layout structure among them by employinga group-sparsity regularization term. The main contributionsof the proposed work are summarized as follows: • Proposing a local sparse tracker, which employs localand spatial information of a target candidate and attainsthe spatial structure among different local patches in-side a target candidate. • Developing a convex optimization model, which intro-duces a group-sparsity regularization term to motivatethe tracker to select the corresponding local patches ofthe same small subset of templates to represent the lo-cal patches of each target candidate. • Designing a fast and parallel numerical algorithmbased on the alternating direction method of multiplier(ADMM), which consists of two subproblems with theclosed-form solutions.The remainder of this paper is organized as follows: Sec-tion 2 introduces the notations. Section 3 presents the SGLSTtogether with its novel convex optimization model solved bythe proposed ADMM-based numerical solution. Section 4demonstrates the experimental results on 16 publicly chal-lenging image sequences, the OTB50, and the OTB100 track-ing benchmarks and compares the SGLST with several state-of-the-art trackers. Section 5 draws the conclusions.
2. NOTATIONS
Throughout this paper, matrices, vectors, and scalers are de-noted by boldface uppercase, boldface lowercase, and italiclowercase letters, respectively. For a given matrix X , X i,j denotes the element at the i th row and j th column, (cid:107) X (cid:107) F indicates the Frobenious norm, (cid:107) X (cid:107) p,q is the (cid:96) p norm of (cid:96) q norm of the rows in X , and X (:) is the vectorized form of X .For a given column vector x , diag( x ) and x i denote a diag-onal matrix formed by the elements of x and the i th element of x , respectively. Symbol tr( · ) stands for the trace operator, X ⊗ Y is the Kronecker product on two matrices X and Y of arbitrary sizes, l is a column vector of all ones with thedimension of l , and I k denotes a k × k identity matrix.
3. PROPOSED METHOD
This section provides detailed information about the proposedstructured group local sparse tracker (SGLST). Specifically,subsection 3.1 formulates a local sparse appearance model inSGLST and explains how this convex optimization model ad-dresses the drawbacks of conventional local sparse trackers[32, 33]. Subsection 3.2 presents an efficient numerical algo-rithm to solve the convex optimization problem presented insubsection 3.1.
The proposed SGLST utilizes both local and spatial infor-mation in the particle filter framework and employs a newoptimization model, which addresses the drawback of con-ventional local sparse trackers by attaining the spatial layoutstructure among different local patches inside a target candi-date.Conventional local sparse trackers [32, 33] individuallyrepresent local patches without considering their spatial lay-out structure. For instance, local patches in [32] are separatelyrepresented by solving the Lasso problem. As a consequence,local patches inside the j th target candidate may be sparselyrepresented by the corresponding local patches inside differ-ent dictionary templates, as illustrated in Figure 1(a), wheretwo local patches of the j th target candidate, shown in the redand blue bounding boxes, may be represented by the corre-sponding local patches in different dictionary templates.In this paper, we propose a novel SGLST that adoptsboth local and spatial information of the target candidates fortracking. The proposed tracker employs a novel optimizationmodel to solve the aforementioned issues associated with con-ventional local sparse trackers [32, 33]. Specifically, SGLSTformulates an optimization problem to impose a structure onthe achieved sparse vectors for different local patches insideeach target candidate and attain the spatial layout structureamong the local patches. To solve the proposed model, we de-velop an efficient numerical algorithm consisting of two sub-problems with closed-form solutions by adopting the alter-nating direction method of multiplier (ADMM) within eachtarget candidate in the optimization function. To maintain thespatial layout structure among local patches, we jointly repre-sent all the local patches of a target candidate in a new convexoptimization model. In other words, if the r th local patch ofthe j th target candidate is best represented by the r th localpatch of the q th template, the s th local patch of the j th tar-get candidate should also be best represented by the s th localpatch of the q th template. As shown in Figure 1(b), we aim to urrent frameTarget candidates th j candidate
10 Templates . . . (a)
Current frameTarget candidates th j candidate
10 Templates . . . (b)
Fig. 1 : Illustration of the sparse representation of two sample local patches of the j th target candidate in: (a) Conventionallocal sparse trackers [32, 33]. One local patch of the j th target candidate, shown in the red bounding box, is represented by itscorresponding patch in the first and the tenth templates, while another local patch of this candidate, shown in the blue boundingbox, is represented by its corresponding patch in two different templates (e.g., the second and the ninth templates). (b) Theproposed SGLST. Both local patches of the j th target candidate, shown in red and blue bounding boxes, are represented bytheir corresponding patches in the same templates (e.g., the first and the tenth templates).represent both local patches of the j th target candidate, shownin the red and blue bounding boxes, by their correspondingpatches in the same dictionary templates (e.g., the first andthe tenth templates).To do so, we first use k target templates and extract l over-lapping d dimensional local patches inside each template toconstruct the dictionary D . Such a representation generatesthe local dictionary matrix D = [ D , . . . , D k ] ∈ R d × ( lk ) ,where D i ∈ R d × l . Then, we construct a matrix X =[ X , . . . , X n ] ∈ R d × ( ln ) , which contains the local patches ofall the target candidates, where n is the number of particles.Next, we define the sparse matrix coefficients C correspond-ing to the j th target candidate as C (cid:44) (cid:2) C · · · C k (cid:3) (cid:62) ∈ R ( lk ) × l , where { C q } kq =1 is a l × l matrix indicating the groupsparse representation of l local patches of the j th target can-didate using l local patches of the q th template. Finally, weformulate the following convex model: minimize C ∈ R ( lk ) × l (cid:107) X j − DC (cid:107) + λ (cid:13)(cid:13)(cid:13)(cid:2) C (:) . . . C k (:) (cid:3) (cid:62) (cid:13)(cid:13)(cid:13) , ∞ (1a) subject to C ≥ , (1b) (cid:62) lk C = (cid:62) l , (1c)where the first term corresponds to the total cost of represent-ing feature matrix X j using the dictionary matrix D and thesecond term is a group-sparsity regularization term, which pe-nalizes the objective function in proportion to the number ofselected templates (dictionary words). Moreover, the group-sparsity regularization term imposes all the local patches tojointly select similar few templates by simultaneously estab-lishing the (cid:107)·(cid:107) , ∞ minimization on matrix C . The regulariza-tion parameter λ > balances the trade-off between the two terms. The constraint (1c) ensures that each local patch in X j is expressed by at least one selected local patch of the dictio-nary D and the sum of a linear combination of coefficients isconstrained.For each target candidate, we find the sparse matrix C us-ing the numerical algorithm presented in subsection 3.2. Wethen perform an averaging process along with an alignmentpooling strategy [32] to find a representative vector. Finally,we calculate the summation of this representative vector asthe likelihood value. The candidate with the highest likeli-hood value is selected as the tracking result. We also updatethe templates throughout the sequence using the same strat-egy as proposed in [32] to handle the appearance variationsof the target region. This section presents a numerical algorithm based on theADMM [35] to efficiently solve the proposed model (1). Theidea of the ADMM is to utilize auxiliary variables to con-vert a complicated convex problem to smaller sub-problems,where each one is efficiently solvable via an explicit formula.The ADMM iteratively solves the sub-problems until conver-gence. To do so, we first define vector m ∈ R k such that m i = arg max | C i (:) | and rewrite (1) as: minimize C ∈ R ( lk ) × l m ∈ R k (cid:107) X j − DC (cid:107) + λ (cid:62) k m (2a) subject to C ≥ , (2b) (cid:62) ( lk ) C = (cid:62) l , (2c) m ⊗ l (cid:62) l ≥ C . (2d)t should be noted that constraint (2d) is imposed in the abovereformulation to ensure the equivalence between (1) and (2).This inequality constraint can be transformed into an equalityone by introducing a non-negative slack matrix U ∈ R ( lk ) × l ,which compensates the difference between m ⊗ l (cid:62) l and C .Using the resultant equality constraint, (cid:62) k m can be equiva-lently written as l (cid:62) ( lk ) ( C + U ) l . Moreover, this equalityconstraint implies that the columns of C + U are regulatedto be identical. Hence, one can simply replace it by a linearconstraint independent of m as presented in (3d). Therefore,we rewrite (2) independent of m as: minimize C , U ∈ R ( lk ) × l (cid:107) X j − DC (cid:107) + λl (cid:62) ( lk ) ( C + U ) l (3a) subject to C ≥ , (3b) (cid:62) ( lk ) C = (cid:62) l , (3c) E ( C + U ) = I k ⊗ l (cid:62) l l ( C + U ) , (3d) U ≥ , (3e)where matrix E serves as the right circular shift operator onthe rows of C + U . To construct the ADMM formulation,whose subproblems possess closed-form solutions, we defineauxiliary variables ˆ C , ˆ U ∈ R ( lk ) × l and reformulate (3) as:minimize C , ˆ C , U , ˆ U ∈ R ( lk ) × l (cid:107) X j − DC (cid:107) + λl (cid:62) ( lk ) ( C + U ) l + µ (cid:13)(cid:13)(cid:13) C − ˆ C (cid:13)(cid:13)(cid:13) + µ (cid:13)(cid:13)(cid:13) U − ˆ U (cid:13)(cid:13)(cid:13) (4a)subject to ˆ C ≥ , (4b) (cid:62) ( lk ) ˆ C = (cid:62) l (4c) E ( C + U ) = I k ⊗ l (cid:62) l l ( C + U ) , (4d) ˆ U ≥ , (4e) C = ˆ C , U = ˆ U . (4f)where µ , µ > are the augmented Lagrangian parameters.Without loss of generality, we assume µ = µ = µ [35]. Thelast two terms in the objective function (4a) are then vanishedfor any feasible solutions, which implies (3) and (4) are equiv-alent. We further form the augmented Lagrangian function tosolve (4) as follows: L µ ( C , U , ˆ C , ˆ U , Λ , Λ ) = (cid:107) X j − DC (cid:107) + λl (cid:62) ( lk ) ( C + U ) l + µ (cid:13)(cid:13)(cid:13)(cid:13) C − ˆ C + Λ µ (cid:13)(cid:13)(cid:13)(cid:13) + µ (cid:13)(cid:13)(cid:13)(cid:13) U − ˆ U + Λ µ (cid:13)(cid:13)(cid:13)(cid:13) (5)where Λ , Λ ∈ R ( lk ) × l are the Lagrangian multipliers cor-responding to the equations in (4f). Given initialization for ˆ C , ˆ U , Λ , and Λ at time t = 0 (e.g., ˆ C , ˆ U , Λ , Λ ), (5) is solved through the ADMM it-erations. At the next iteration, C and U are updated by min-imizing (5) under the constraint (4d). To do so, we first de-fine { z i } lki =1 , where z i ∈ R l is obtained by stacking the i th rows of C and U . We then divide this minimization probleminto lk equality constrained quadratic programs, where eachprogram has its analytical solution. Using the updated C and U , we compute ˆ C and ˆ U by minimizing (5) with the con-straints (4b), (4c), (4e). To this end, we split the problem intotwo separate subproblems with closed-form solutions over ˆ C and ˆ U , where the first subproblem consists of l independentEuclidean norm projections onto the probability simplex con-straints and the second subproblem consists of l independentEuclidean norm projections onto the non-negative orthant. Fi-nally, we update Λ and Λ by performing l parallel updatesover their respective columns. All these iterative updates canbe quickly performed due to the closed-form solutions.
4. EXPERIMENTAL RESULTS
In this section, we evaluate the performance of the proposedSGLST on 16 publicly available frame sequences, the OTB50[36], and the OTB100 [37] tracking benchmarks.We resize each target region to × pixels and extractoverlapping local patches of × pixels inside the targetregion using the step size of 8 pixels. This leads to l = 9 local patches. For each local patch, we extract two sets offeatures, namely, gray-level intensity features and histogramof oriented gradients (HOG) features, to represent its char-acteristics from two perspectives. Both features have shownpromising tracking results in different trackers and HOG fea-tures [38] have demonstrated significant improvement in vi-sual tracking [39, 30, 34]. The proposed SGLST therefore hastwo variants: SGLST Color and SGLST HOG. For the HOGfeatures, we resize the target candidates to × pixelsand exploit 196 dimensional HOG features for each of the × local patches to capture relatively high-resolutionedge information. For all the experiments, we set λ = 0 . , µ = µ = µ = 0 . , the number of particles n = 400 , andthe number of target templates k = 10 . We adopt the samesetting as used in [32] to update templates. We conduct extensive experiments on 16 challenging framesequences and compare SGLST Color and SGLST HOGwith 11 state-of-the-art trackers, namely, L1T [26], Struck[40], IVT [18], MTT [41], MIL [42], VTD [19], Frag [17],ASLA [32], KCF [43], MEEM [44], and RSST HOG [34].To ensure fair comparison, we use the available source codeor the binary code together with the optimal parameters pro- oy faceocc1 faceocc2 girlsurferkitesurf
Fig. 2 : Comparison of the tracking results of 11 state-of-the-art trackers and the two variants of the proposed SGLST on boy , faceocc1 , faceocc2 , girl , kitesurf , and surfer image sequences. Frame indices are shown at the top left corner of representativeframes. Results are best viewed on high-resolution displays. ( — L1T, — Struck, — IVT, — MTT, — MIL, — VTD, — Frag, — ASLA, - - -
KCF, - - -
MEEM, — RSST HOG, — SGLST Color, — SGLST HOG) vided by the respective authors to produce the tracking re-sults.Figure 2, Figure 3, and Figure 4 demonstrate the track-ing results of the 13 aforementioned compared methods onthree representative frames of each of the 16 sequences. Here,we briefly analyze the tracking performance of each com-pared tracker under different challenging scenarios. The L1Ttracker fails when the target undergoes fast motion and rota-tion as shown in
Kitesurf and surfer sequences, occlusion asshown in the jogging1 sequence, or scale variation as shownin the board sequence. Struck cannot track the target when oc-clusion ( jogging1 and box ) or fast motion ( surfer ) occurs. IVTdrifts from the target in the frame sequences containing theout-of-view challenge ( girl and jogging ), fast motion ( boy ), orscale variation ( human5 ). MTT loses the target having largemotions between consecutive frames ( board and crossing ).MIL fails to track the target when scale variation ( car4 and car2 ) or occlusion ( walking2 ) happens. VTD and Frag leadto the drift of the target under fast motion and deformationcircumstances as shown in crossing and human7 sequences.In addition, they cannot adequately handle scale variation asshown in the box sequence. ASLA does not yield good per-formance in the cases of heavy occlusions ( faceocc1 , jogging ,and walking2 ). KCF is incapable of dealing with scale vari-ation ( car4 and walking2 ), occlusion (jogging1) , or out-of-view challenges ( box ). MEEM achieves good overall perfor-mance. However, it drifts from the target when scale varies( car4 ) and does not sufficiently address the challenge of par-tial occlusion ( walking2 and box ). RSST HOG performs wellin most sequences, but it drifts away in the sequences withscale variations ( doll , board , and box ). SGLST Color alsodemonstrates favorable performance in most of the sequences.However, it encounters problems when illumination changes happen ( kitesurf and box ). Among all the compared methods,SGLST HOG performs well in tracking human faces, humanbodies, objects, and vehicles in the 16 challenging sequences.The favorable performance of the proposed SGLST reflectsthe advantages of adopting local patches within the target andkeeping the spatial structure among local patches. In addition,using HOG features in SGLST helps to improve the trackingperformance yielded by using intensity features.For quantitative comparison, we compute the averageoverlap score across all frames of each image sequence foreach compared method. It is worthy of mentioning that theoverlap score between the tracked bounding box r t and theground truth bounding box r g is defined as S = | r t ∩ r g || r t ∪ r g | ,where | · | is the number of pixels in the bounding box, ∩ represents the intersection of the two bounding boxes,and ∪ represents the union of the two bounding boxes.Table 1 summarizes the average overlap scores across allframes of each of 16 sequences for compared methods.It is clear that the two proposed trackers, SGLST Colorand SGLST HOG, achieve overall favorable tracking perfor-mance for the tested sequences. On average, SGLST Colordrastically improves the average overlap scores of L1T, IVT,MTT, MIL, VTD, and Frag by 24.49%, 45.24%, 32.61%,60.53%, 38.64%, and 56.41%, respectively. It also outper-forms Struck, ASLA, KCF, and MEEM by improving theiraverage overlap scores by 10.91%, 19.61%, 15.09%, and3.39%, respectively. RSST HOG is the only tracker thatoutperforms SGLST Color by 8.2% mainly due to the useof HOG features. The proposed SGLST HOG achieves thebest average overlap score and significantly outperformsSGLST Color and RSST HOG by 21.31% and 12.12%, re-spectively. In summary, the qualitative results shown in Figure ogging1human5 human7 crossing doll walking2 Fig. 3 : Comparison of the tracking results of 11 state-of-the-art trackers and the two variants of the proposed SGLST on jogging1 , crossing , human5 , human7 , walking2 , and doll image sequences. Frame indices are shown at the top left corner ofrepresentative frames. Results are best viewed on high-resolution displays. ( — L1T, — Struck, — IVT, — MTT, — MIL, — VTD, — Frag, — ASLA, - - -
KCF, - - -
MEEM, — RSST HOG, — SGLST Color, — SGLST HOG)
2, Figure 3, and Figure 4 and the quantitative results shownin Table 1 demonstrate that SGLST HOG achieves the besttracking performance and SGLST Color achieves the thirdbest tracking performance, inferior to RSST HOG that usesHOG features instead of intensity features. Both variants ofthe proposed SGLST can successfully track the targets in amajority of frames in all 16 tested sequences with differentchallenging conditions such as fast motion, rotation and scalevariations, occlusions, and illumination changes.
We conduct the experiments on the OTB50 tracking bench-mark [36] to evaluate the overall performance of the proposedSGLST Color and SGLST HOG under different challenges.This benchmark consists of 50 annotated sequences, where49 sequences has one annotated target and one sequence ( jog-ging ) has two annotated targets. Each sequence is also labeledwith attributes specifying the presence of different challengesincluding illumination variation (IV), scale variation (SV),occlusion (OCC), deformation (DEF), motion blur (MB), fastmotion (FM), in-plane rotation (IPR), out-of-plane rotation(OPR), out-of-view (OV), background clutter (BC), and lowresolution (LR). The sequences are categorized based on theattributes and 11 challenge subsets are generated. These sub-sets are utilized to evaluate the performance of trackers in dif-ferent challenge categories.For this benchmark data set, there are online availabletracking results for 29 trackers [36]. In addition, we in-clude the tracking results of additional 12 recent trackers,namely, MTMVTLS [29], MTMVTLAD [30], MSLA-4 [33] (the recent version of ASLA [32]), SST [5], SMTMVT [45],CNT [46], TGPR [47], DSST [12], PCOM [48], KCF [43],MEEM[44], and RSST [34]. Following the protocol proposedin [36], we use the same parameters for SGLST Color andSGLST HOG on all the sequences to obtain the one-passevaluation (OPE) results, which are conventionally used toevaluate trackers by initializing them using the ground truthlocation in the first frame. We present the overall OPE suc-cess plot and the OPE success plots for BC, DEF, FM, IPR,and OPR challenge subsets in Figure 5 and the OPE successplots for IV, LR, MB, OCC, OV, and SV challenge subsetsin Figure 6. These success plots show the percentage of suc-cessful frames at the overlap thresholds ranging from 0 to 1,where the successful frames are the ones who have overlapscores larger than a given threshold. For fair comparison, weuse the area under curve (AUC) of each success plot to rankthe trackers. For convenience of the reader, we only includetop 10 of the 43 compared trackers in each plot. The valuesin the parenthesis alongside the legends are AUC scores. Thevalues in the parenthesis alongside the titles for 11 challengesubsets are the number of video sequences in the respectivesubset.It is clear from the overal success plot in Figure 5 thatSGLST HOG (i.e., incorporating HOG features in SGLST)improves the tracking performance of SGLST Color (i.e., in-corporating intensity features in SGLST). The similar im-provement trends are also observed in [39, 34]. Among the 29baseline trackers employed in [36], SCM achieves the mostfavorable performance. SGLST HOG outperforms SCM by11.42% in terms of the AUC score. Compared with the 12 ad- oxboardcar4 car2
Fig. 4 : Comparison of the tracking results of 11 state-of-the-art trackers and the two variants of the proposed SGLST on board , box , car4 , and car2 image sequences. Frame indices are shown at the top left corner of representative frames. Results are bestviewed on high-resolution displays. ( — L1T, — Struck, — IVT, — MTT, — MIL, — VTD, — Frag, — ASLA, - - -
KCF, - - -
MEEM, — RSST HOG, — SGLST Color, — SGLST HOG)
Seq L1T Struck IVT MTT MIL VTD Frag ASLA KCF MEEM RSST HOG SGLST Color SGLST HOGboy 0.73 0.76 0.26 0.49 0.49 0.62 0.38 0.36 0.77 0.79 0.76 faceocc2 0.68 0.76 0.72 0.74 0.67 0.71 0.65 0.65 0.74 human5 0.38 0.35 0.18 0.45 0.21 0.28 0.03 0.68 0.21 0.28 0.51 0.35 human7 0.51 0.48 0.23 0.28 0.48 0.28 0.27 0.29 0.29 0.48 0.58 0.40 walking2 0.75 0.51 0.79 0.78 0.29 0.40 0.35 0.37 0.39 0.31 0.76 0.80 doll 0.46 0.55 0.43 0.39 0.42 0.66 0.61 0.78 0.59 0.60 0.48 box 0.55 0.21 0.51 0.21 0.27 0.42 0.46 0.34 0.35 0.31 0.33 0.21 car4 0.72 0.48 0.82 0.75 0.25 0.36 0.19 0.75 0.48 0.45
Table 1 : Summary of the average overlap scores of 13 compared methods on 16 sequences. The bold numbers in blue indicatethe best performance, while the numbers in red indicate the second best.ditional recent trackers, SGLST HOG outperforms MSLA-4,SMTMVT, KCF, TGPR, RSST HOG, and CNT by 9.88%,9.66%, 8.17%, 5.10%, 2.39%, and 2.02%, respectively. Itachieves a comparable performance as that of DSST andMEEM. It should be mentioned that the variant of RSST withintensity features (i.e., RSST Color) reports the AUC scoreof 0.520 and the proposed SGLST Color achieves the AUCscore of 0.523. This slight improvement indicates that theproposed optimization model is better than its counterpart inRSST Color.The proposed SGLST HOG performs significantly betterthan traditional sparse trackers such as L1APG [27], LRST[28], ASLA [32], MTT [41], and MTMVTLS [29]. It outper-forms most recent sparse trackers such as MTMVTLAD [30],SST [5], MSLA-4 [33], SMTMVT [45], and RSST HOG[34]. SGLST HOG, which yields the AUC score of 0.556,also achieves better performance than some correlation fil-ter (CF) based methods such as KCF (AUC score of 0.514) and DSST (AUC score of 0.554). Moreover, it outperformssome deep learning-based methods such as CNT (AUC scoreof 0.545) and GOTURN (AUC score of 0.444) [49]. How-ever, the proposed SGLST HOG yields lower performancethan some deep learning-based methods such as FCNT [50](AUC score of 0.599), DLSSVM [51] (AUC score of 0.589),and RSST Deep [34] (AUC score of 0.590). We believe thatSGLST can be further improved by incorporating the deepfeatures, as the similar improvement trends are clearly shownin RSST [34].We further evaluate the performance of SGLST on 11challenge subsets. As demonstrated in Figure 5 and Figure6, SGLST HOG ranks as one of the top three trackers in 5subsets with DEF, OPR, LR, MB, and SV challenges andSGLST Color ranks as one of the top three trackers in 2subsets with IV and LR challenges. SGLST HOG achievesthe fourth rank on 2 subsets with IPR and OCC challengesand the fifth rank on 2 subsets with FM and IV challenges.
Overlap threshold S u cc e ss r a t e Success plots of OPE
MEEM [0.566]SGLST_HOG [0.556]DSST [0.554]CNT [0.545]RSST_HOG [0.543]TGPR [0.529]SGLST_Color [0.523]KCF [0.514]SMTMVT [0.507]MASLA_4 [0.506]
Overlap threshold S u cc e ss r a t e Success plots of OPE - background clutter (21)
MEEM [0.569]MTMVTLAD [0.546]TGPR [0.543]KCF [0.535]DSST [0.517]MTMVTLS [0.507]SGLST_HOG [0.507]SGLST_Color [0.495]SST [0.489]RSST_HOG [0.488]
Overlap threshold S u cc e ss r a t e Success plots of OPE - deformation (19)
MEEM [0.560]TGPR [0.556]SGLST_HOG [0.552]KCF [0.534]CNT [0.524]SGLST_Color [0.517]DSST [0.506]RSST_HOG [0.479]MTMVTLAD [0.479]SMTMVT [0.475]
Overlap threshold S u cc e ss r a t e Success plots of OPE - fast motion (17)
MEEM [0.553]Struck [0.462]KCF [0.459]SMTMVT [0.454]SGLST_HOG [0.454]RSST_HOG [0.451]TGPR [0.441]MTMVTLAD [0.438]DSST [0.428]SGLST_Color [0.428]
Overlap threshold S u cc e ss r a t e Success plots of OPE - in-plane rotation (31)
DSST [0.563]MEEM [0.535]SMTMVT [0.518]SGLST_HOG [0.511]KCF [0.497]CNT [0.495]MTMVTLAD [0.490]TGPR [0.487]RSST_HOG [0.487]MASLA_4 [0.481] 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Overlap threshold S u cc e ss r a t e Success plots of OPE - out-of-plane rotation (39)
MEEM [0.558]DSST [0.536]SGLST_HOG [0.528]SMTMVT [0.518]TGPR [0.507]RSST_HOG [0.505]CNT [0.501]SGLST_Color [0.496]KCF [0.495]MTMVTLAD [0.487]
Fig. 5 : OTB50 overall OPE success plots and the OPE success plot BC, DEF, FM, IPR, and OPR challenge subsets. The valueappearing in the title is the number of sequences in the specific subset. The values appearing in the legend are the AUC scores.Only the top 10 trackers are presented, while the results of the other trackers can be found in [36].SGLST Color achieves the fifth rank on one subset with theSV challenge. However, SGLST is not in the list of the top10 trackers for the subset with the OV challenge. Overall,the proposed SGLST ranks as one of the top 5 trackers on9 out of 11 subsets (e.g., 22 out of 50 image sequences)with DEF, OPR, LR, MB, SV, IV, IPR, OCC, and SV chal-lenges. SGLST HOG significantly improves the tracking per-formance (i.e., the AUC score) of its variant and the third-ranked tracker, SGLST Color, by 23.31% due to the incorpo-ration of the HOG features instead of the intensity features.It improves the tracking performance of the second-rankedtracker, RSST HOG, by 10.81% mainly due to its novel opti-mization model that employs a group-sparsity regularizationterm to adopt local and spatial information of the target can-didates and attain the spatial layout structure among them.
We conduct the experiments on the OTB100 tracking bench-mark [37] to evaluate the overall performance of the proposedSGLST Color and SGLST HOG under different challenges.This benchmark is the extension of OTB50 [36], which con-sists of 100 annotated sequences. Each sequence is labeledwith attributes specifying the presence of different challenges.The two sequences, jogging and
Skating , have two annotatedtargets. The rest of 98 sequences have one annotated target. We evaluate the proposed SGLST against 29 baseline trackersused in [37] and seven recent trackers including DSST [12],PCOM [48], KCF [43], MEEM [44], TGPR [47], and RSST[34]. The other 6 trackers compared in the OTB50 bench-mark do not provide their results on the OTB100 benchmark.Therefore, they are excluded in this experiment.Figure 7 presents the overall OPE success plot andthe OPE success plots for BC, DEF, FM, IPR, and OPRchallenge subsets and Figure 8 provides the OPE successplots for IV, LR, MB, OCC, OV, and SV challenge sub-sets. Top 10 trackers are included in each plot. The overallsuccess plot in Figure 7 clearly demonstrates that the besttracker MEEM, a multi-expert tracker employing an onlinelinear SVM and an explicit feature mapping method, hasa slightly better AUC score than the second best tracker,the proposed SGLST HOG. The difference in terms of theAUC score is only 0.006. SGLST HOG improves its variant,SGLST Color, by 16.67% due to the use of HOG featuresover intensity features. It also improves the fourth-rankedtracker RSST HOG, the most recent sparse tracker, by 1.95%due to its novel optimization model. Compared to the third-ranked tracker DSST, a discriminative CF-based tracker, itimproves the AUC score of DSST by 1.16%.Similar to the tracking results obtained on the OTB50tracking benchmark, the proposed SGLST HOG performssignificantly better than traditional sparse trackers such as
Overlap threshold S u cc e ss r a t e Success plots of OPE - illumination variation (25)
DSST [0.561]MEEM [0.533]SGLST_Color [0.502]SMTMVT [0.502]SGLST_HOG [0.497]KCF [0.493]RSST_HOG [0.488]TGPR [0.486]MASLA_4 [0.473]SCM [0.473]
Overlap threshold S u cc e ss r a t e Success plots of OPE - low resolution (4)
CNT [0.437]SGLST_HOG [0.436]SGLST_Color [0.431]DSST [0.408]SST [0.407]RSST_HOG [0.392]MTT [0.389]MTMVTLS [0.388]L1APG [0.381]Struck [0.372]
Overlap threshold S u cc e ss r a t e Success plots of OPE - motion blur (12)
MEEM [0.541]KCF [0.497]SGLST_HOG [0.456]DSST [0.455]TGPR [0.440]MTMVTLAD [0.438]SMTMVT [0.437]RSST_HOG [0.433]Struck [0.433]SGLST_Color [0.426]
Overlap threshold S u cc e ss r a t e Success plots of OPE - occlusion (29)
MEEM [0.552]DSST [0.532]KCF [0.514]SGLST_HOG [0.507]RSST_HOG [0.503]CNT [0.503]TGPR [0.494]SCM [0.487]SMTMVT [0.482]SGLST_Color [0.470]
Overlap threshold S u cc e ss r a t e Success plots of OPE - out of view (6)
MEEM [0.606]KCF [0.550]SMTMVT [0.527]MTMVTLAD [0.521]RSST_HOG [0.503]MTMVTLS [0.485]LOT [0.467]DSST [0.462]Struck [0.459]TLD [0.457]
Overlap threshold S u cc e ss r a t e Success plots of OPE - scale variation (28)
DSST [0.546]MASLA_4 [0.526]SGLST_HOG [0.524]SCM [0.518]SGLST_Color [0.517]CNT [0.508]SST [0.504]RSST_HOG [0.502]MEEM [0.498]SMTMVT [0.491]
Fig. 6 : OTB50 OPE success plots for IV, LR, MB, OCC, OV, and SV challenge subsets. The value appearing in the title is thenumber of sequences in the specific subset. The values appearing in the legend are the AUC scores. Only the top 10 trackers arepresented, while the results of the other trackers can be found in [36].L1APG [27], LRST [28], ASLA [32], and MTT [41]. It alsooutperforms RSST HOG [34], one of the most recent sparsetrackers that provides the results on the OTB100 trackingbenchmark. SGLST HOG, which yields the AUC score of0.524, also achieves better performance than some CF anddeep learning based methods such as KCF (AUC score of0.478), DSST (AUC score of 0.518), and GOTURN (AUCscore of 0.427) [49]. However, it yields lower performancethan some deep learning-based methods such as CNN-SVM(AUC of 0.554), CF2 (AUC of 0.562) [52], and RSST Deep(AUC of 0.583). We believe that incorporating the deep fea-tures in SGLST can further improve its tracking performanceto be more comparable with the other deep-learning-basedtrackers.We further evaluate the performance of SGLST on 11challenge subsets in the OTB100 benchmark. As demon-strated in Figure 7 and Figure 8, SGLST HOG ranks as oneof the top three trackers in all 11 subsets except one subsetwith the BC challenge and SGLST Color ranks as one ofthe top three trackers in one subset with the LR challenge.SGLST HOG achieves the fifth rank on the subset with theBC challenges. It achieves better performance than the besttracker, MEEM, in three subsets with IV, LR, and SV chal-lenges. Overall, the proposed SGLST ranks as the top 3 track-ers on 10 out of 11 subsets (e.g., 69 out of 100 image se-quences) with DEF, FM, IPR, OPR, IV, SV, LR, MB, OCC, and OV challenges.
5. CONCLUSIONS
In this paper, we propose a novel tracker, called struc-tured group local sparse tracker (SGLST), which exploitslocal patches within target candidates in the particle fil-ter framework. Unlike conventional local sparse trackers,SGLST employs a new convex optimization model to pre-serve spatial layout structure among the local patches. Tosolve the proposed optimization model, we develop an ef-ficient numerical algorithm consisting of two subproblemswith closed-form solutions based on ADMM. We test theperformance of the proposed tracker with two types of fea-tures including gray-level intensity features and HOG fea-tures. The qualitative and quantitative results on 16 publiclyframe sequences demonstrate that SGLST HOG outperformsall compared state-of-the-art trackers. The experimental re-sults on OTB50 and OTB100 tracking benchmarks demon-strate that SGLST HOG outperforms all compared state-of-the-art trackers except the MEEM tracker in terms of the av-erage AUC score.
Overlap threshold S u cc e ss r a t e Success plots of OPE
MEEM [0.530]SGLST_HOG [0.524]DSST [0.518]RSST_HOG [0.514]TGPR_HOG [0.510]KCF [0.478]Struck [0.463]SGLST_Color [0.453]SCM [0.446]TLD [0.427]
Overlap threshold S u cc e ss r a t e Success plots of OPE - background clutter (31)
TGPR_HOG [0.529]DSST [0.524]MEEM [0.519]RSST_HOG [0.504]SGLST_HOG [0.502]KCF [0.501]SCM [0.469]SGLST_Color [0.465]Struck [0.438]ASLA [0.428]
Overlap threshold S u cc e ss r a t e Success plots of OPE - deformation (44)
MEEM [0.489]SGLST_HOG [0.482]TGPR_HOG [0.445]KCF [0.436]DSST [0.427]SGLST_Color [0.422]RSST_HOG [0.417]SCM [0.399]Struck [0.383]CPF [0.366]
Overlap threshold S u cc e ss r a t e Success plots of OPE - fast motion (39)
MEEM [0.542]SGLST_HOG [0.490]TGPR_HOG [0.484]Struck [0.470]KCF [0.462]DSST [0.458]RSST_HOG [0.455]TLD [0.434]CXT [0.407]OAB [0.382]
Overlap threshold S u cc e ss r a t e Success plots of OPE - in-plane rotation (51)
MEEM [0.529]DSST [0.506]SGLST_HOG [0.488]TGPR_HOG [0.481]RSST_HOG [0.480]KCF [0.472]Struck [0.453]CXT [0.447]TLD [0.432]SGLST_Color [0.420]
Overlap threshold S u cc e ss r a t e Success plots of OPE - out-of-plane rotation (63)
MEEM [0.525]SGLST_HOG [0.497]TGPR_HOG [0.488]DSST [0.477]RSST_HOG [0.455]KCF [0.453]SGLST_Color [0.450]SCM [0.429]Struck [0.424]ASLA [0.409]
Fig. 7 : OTB100 overall OPE success plots and the OPE success plot BC, DEF, FM, IPR, and OPR challenge subsets. The valueappearing in the title is the number of sequences in the specific subset. The values appearing in the legend are the AUC scores.Only the top 10 trackers are presented, while the results of the other trackers can be found in [37].
6. REFERENCES [1] Alper Yilmaz, Omar Javed, and Mubarak Shah, “Objecttracking: A survey,”
Acm computing surveys (CSUR) ,vol. 38, no. 4, pp. 13, 2006.[2] Samuele Salti, Andrea Cavallaro, and Luigi Di Stefano,“Adaptive appearance modeling for video tracking: Sur-vey and evaluation,”
IEEE Transactions on Image Pro-cessing , vol. 21, no. 10, pp. 4334–4348, 2012.[3] Yu Pang and Haibin Ling, “Finding the best from thesecond bests-inhibiting subjective bias in evaluation ofvisual tracking algorithms,” in
Proceedings of the IEEEInternational Conference on Computer Vision , 2013, pp.2784–2791.[4] Matej Kristan, Jiri Matas, Ales Leonardis, Michael Fels-berg, Luka Cehovin, Gustavo Fernandez, Tomas Vojir,Gustav Hager, Georg Nebehay, and Roman Pflugfelder,“The visual object tracking vot2015 challenge results,”in
Proceedings of the IEEE international conference oncomputer vision workshops , 2015, pp. 1–23.[5] Tianzhu Zhang, Si Liu, Changsheng Xu, ShuichengYan, Bernard Ghanem, Narendra Ahuja, and Ming-Hsuan Yang, “Structural sparse tracking,” in
Proceed- ings of the IEEE conference on computer vision and pat-tern recognition , 2015, pp. 150–158.[6] Tianzhu Zhang, Adel Bibi, and Bernard Ghanem, “Indefense of sparse tracking: Circulant sparse tracker,” in
Proceedings of the IEEE conference on computer visionand pattern recognition , 2016, pp. 3880–3888.[7] Shai Avidan, “Ensemble tracking,”
IEEE transactionson pattern analysis and machine intelligence , vol. 29,no. 2, 2007.[8] Helmut Grabner, Michael Grabner, and Horst Bischof,“Real-time tracking via on-line boosting.,” in
Bmvc ,2006, vol. 1, p. 6.[9] Helmut Grabner, Christian Leistner, and Horst Bischof,“Semi-supervised on-line boosting for robust tracking,”in
European conference on computer vision . Springer,2008, pp. 234–247.[10] Boris Babenko, Ming-Hsuan Yang, and Serge Belongie,“Visual tracking with online multiple instance learn-ing,” in
Computer Vision and Pattern Recognition, 2009.CVPR 2009. IEEE Conference on . IEEE, 2009, pp. 983–990.
Overlap threshold S u cc e ss r a t e Success plots of OPE - illumination variation (38)
DSST [0.559]TGPR_HOG [0.550]SGLST_HOG [0.529]RSST_HOG [0.520]MEEM [0.517]SCM [0.484]KCF [0.479]SGLST_Color [0.460]ASLA [0.439]MASLA_4 [0.439]
Overlap threshold S u cc e ss r a t e Success plots of OPE - low resolution (9)
RSST_HOG [0.469]SGLST_Color [0.443]SGLST_HOG [0.439]TGPR_HOG [0.409]SCM [0.407]DSST [0.395]CXT [0.373]MEEM [0.364]Struck [0.363]TLD [0.353]
Overlap threshold S u cc e ss r a t e Success plots of OPE - motion blur (29)
MEEM [0.557]TGPR_HOG [0.489]SGLST_HOG [0.489]RSST_HOG [0.473]DSST [0.470]Struck [0.468]KCF [0.462]TLD [0.435]CXT [0.396]OAB [0.390]
Overlap threshold S u cc e ss r a t e Success plots of OPE - occlusion (49)
MEEM [0.504]SGLST_HOG [0.472]TGPR_HOG [0.467]DSST [0.460]KCF [0.445]RSST_HOG [0.429]SCM [0.427]SGLST_Color [0.421]Struck [0.394]ASLA [0.379]
Overlap threshold S u cc e ss r a t e Success plots of OPE - out of view (14)
TGPR_HOG [0.497]MEEM [0.488]SGLST_HOG [0.437]KCF [0.401]DSST [0.388]Struck [0.384]MIL [0.362]RSST_HOG [0.362]TLD [0.361]CT [0.353]
Overlap threshold S u cc e ss r a t e Success plots of OPE - scale variation (64)
SGLST_HOG [0.490]DSST [0.476]TGPR_HOG [0.472]MEEM [0.470]RSST_HOG [0.454]SCM [0.431]SGLST_Color [0.424]ASLA [0.409]MASLA_4 [0.409]Struck [0.404]
Fig. 8 : OTB100 OPE success plots for IV, LR, MB, OCC, OV, and SV challenge subsets. The value appearing in the title is thenumber of sequences in the specific subset. The values appearing in the legend are the AUC scores. Only the top 10 trackers arepresented, while the results of the other trackers can be found in [37].[11] Zdenek Kalal, Jiri Matas, and Krystian Mikolajczyk,“Pn learning: Bootstrapping binary classifiers by struc-tural constraints,” in
Computer Vision and PatternRecognition (CVPR), 2010 IEEE Conference on . IEEE,2010, pp. 49–56.[12] Martin Danelljan, Gustav H¨ager, Fahad Khan, andMichael Felsberg, “Accurate scale estimation for robustvisual tracking,” in
British Machine Vision Conference,Nottingham, September 1-5, 2014 . BMVA Press, 2014.[13] Chao Ma, Xiaokang Yang, Chongyang Zhang, andMing-Hsuan Yang, “Long-term correlation tracking,” in
Proceedings of the IEEE conference on computer visionand pattern recognition , 2015, pp. 5388–5396.[14] Qingyong Hu, Y Guo, Yunjin Chen, and Jingjing Xiao,“Correlation filter tracking: Beyond an open-loop sys-tem,” in
British Machine Vision Conference (BMVC) ,2017.[15] Michael J Black and Allan D Jepson, “Eigentracking:Robust matching and tracking of articulated objects us-ing a view-based representation,”
International Journalof Computer Vision , vol. 26, no. 1, pp. 63–84, 1998.[16] Dorin Comaniciu, Visvanathan Ramesh, and PeterMeer, “Kernel-based object tracking,”
IEEE Transac- tions on pattern analysis and machine intelligence , vol.25, no. 5, pp. 564–577, 2003.[17] Amit Adam, Ehud Rivlin, and Ilan Shimshoni, “Robustfragments-based tracking using the integral histogram,”in
Computer vision and pattern recognition, 2006 IEEEComputer Society Conference on . IEEE, 2006, vol. 1,pp. 798–805.[18] David A Ross, Jongwoo Lim, Ruei-Sung Lin, and Ming-Hsuan Yang, “Incremental learning for robust visualtracking,”
International journal of computer vision , vol.77, no. 1-3, pp. 125–141, 2008.[19] Junseok Kwon and Kyoung Mu Lee, “Visual trackingdecomposition,” in
Computer Vision and Pattern Recog-nition (CVPR), 2010 IEEE Conference on . IEEE, 2010,pp. 1269–1276.[20] Giorgio Roffo and Simone Melzi, “Online feature se-lection for visual tracking,” 2016.[21] Yu-Wei Chao, Yi-Ren Yeh, Yu-Wen Chen, Yuh-Jye Lee,and Yu-Chiang Frank Wang, “Locality-constrainedgroup sparse representation for robust face recognition,”in
Image Processing (ICIP), 2011 18th IEEE Interna-tional Conference on . IEEE, 2011, pp. 761–764.22] Shutao Li, Haitao Yin, and Leyuan Fang, “Group-sparse representation with dictionary learning for medi-cal image denoising and fusion,”
IEEE Transactions onbiomedical engineering , vol. 59, no. 12, pp. 3450–3459,2012.[23] Fariba Zohrizadeh, Mohsen Kheirandishfard, andFarhad Kamangar, “Image segmentation using sparsesubset selection,” in
Applications of Computer Vision(WACV), 2018 IEEE Winter Conference on . IEEE, 2018,pp. 1470–1479.[24] Fariba Zohrizadeh, Mohsen Kheirandishfard, KamranGhasedidizaji, and Farhad Kamangar, “Reliability-based local features aggregation for image segmenta-tion,” in
International Symposium on Visual Computing .Springer, 2016, pp. 193–202.[25] Songze Tang, Nan Zhou, and Liang Xiao, “Pansharp-ening via locality-constrained sparse representation,” in
BMVC , 2017.[26] Xue Mei and Haibin Ling, “Robust visual tracking andvehicle classification via sparse representation,”
IEEEtransactions on pattern analysis and machine intelli-gence , vol. 33, no. 11, pp. 2259–2272, 2011.[27] Chenglong Bao, Yi Wu, Haibin Ling, and Hui Ji, “Realtime robust l1 tracker using accelerated proximal gradi-ent approach,” in
Computer Vision and Pattern Recog-nition (CVPR), 2012 IEEE Conference on . IEEE, 2012,pp. 1830–1837.[28] Tianzhu Zhang, Bernard Ghanem, Si Liu, and Naren-dra Ahuja, “Low-rank sparse learning for robust visualtracking,” in
European conference on computer vision .Springer, 2012, pp. 470–484.[29] Zhibin Hong, Xue Mei, Danil Prokhorov, and DachengTao, “Tracking via robust multi-task multi-view jointsparse representation,” in
Proceedings of the IEEE inter-national conference on computer vision , 2013, pp. 649–656.[30] Xue Mei, Zhibin Hong, Danil Prokhorov, and DachengTao, “Robust multitask multiview tracking in videos,”
IEEE transactions on neural networks and learning sys-tems , vol. 26, no. 11, pp. 2874–2890, 2015.[31] Baiyang Liu, Junzhou Huang, Casimir Kulikowski, andLin Yang, “Robust visual tracking using local sparseappearance model and k-selection,”
IEEE transactionson pattern analysis and machine intelligence , vol. 35,no. 12, pp. 2968–2981, 2013.[32] Xu Jia, Huchuan Lu, and Ming-Hsuan Yang, “Visualtracking via adaptive structural local sparse appearancemodel,” in
Computer vision and pattern recognition (CVPR), 2012 IEEE Conference on . IEEE, 2012, pp.1822–1829.[33] Xu Jia, Huchuan Lu, and Ming-Hsuan Yang, “Visualtracking via coarse and fine structural local sparse ap-pearance models,”
IEEE Transactions on Image pro-cessing , vol. 25, no. 10, pp. 4555–4564, 2016.[34] Tianzhu Zhang, Changsheng Xu, and Ming-HsuanYang, “Robust structural sparse tracking,”
IEEE Trans-actions on Pattern Analysis and Machine Intelligence ,2018.[35] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato,Jonathan Eckstein, et al., “Distributed optimization andstatistical learning via the alternating direction methodof multipliers,”
Foundations and Trends R (cid:13) in Machinelearning , vol. 3, no. 1, pp. 1–122, 2011.[36] Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang, “Onlineobject tracking: A benchmark,” in Proceedings of theIEEE conference on computer vision and pattern recog-nition , 2013, pp. 2411–2418.[37] Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang, “Ob-ject tracking benchmark,”
IEEE Transactions on Pat-tern Analysis and Machine Intelligence , vol. 37, no. 9,pp. 1834–1848, 2015.[38] Navneet Dalal and Bill Triggs, “Histograms of orientedgradients for human detection,” in
Computer Vision andPattern Recognition, 2005. CVPR 2005. IEEE ComputerSociety Conference on . IEEE, 2005, vol. 1, pp. 886–893.[39] Jo˜ao F Henriques, Rui Caseiro, Pedro Martins, andJorge Batista, “Exploiting the circulant structure oftracking-by-detection with kernels,” in
European con-ference on computer vision . Springer, 2012, pp. 702–715.[40] Sam Hare, Stuart Golodetz, Amir Saffari, VibhavVineet, Ming-Ming Cheng, Stephen L Hicks, andPhilip HS Torr, “Struck: Structured output tracking withkernels,”
IEEE transactions on pattern analysis and ma-chine intelligence , vol. 38, no. 10, pp. 2096–2109, 2016.[41] Tianzhu Zhang, Bernard Ghanem, Si Liu, and Naren-dra Ahuja, “Robust visual tracking via multi-task sparselearning,” in
Computer vision and pattern recognition(CVPR), 2012 IEEE conference on . IEEE, 2012, pp.2042–2049.[42] Boris Babenko, Ming-Hsuan Yang, and Serge Belongie,“Robust object tracking with online multiple instancelearning,”
IEEE transactions on pattern analysis andmachine intelligence , vol. 33, no. 8, pp. 1619–1632,2011.43] Jo˜ao F Henriques, Rui Caseiro, Pedro Martins, andJorge Batista, “High-speed tracking with kernelized cor-relation filters,”
IEEE Transactions on Pattern Analysisand Machine Intelligence , vol. 37, no. 3, pp. 583–596,2015.[44] Jianming Zhang, Shugao Ma, and Stan Sclaroff,“MEEM: robust tracking via multiple experts using en-tropy minimization,” in
Proc. of the European Confer-ence on Computer Vision (ECCV) , 2014.[45] M. Javanmardi and X. Qi, “Robust structured multi-task multi-view sparse tracking,” in ,July 2018, pp. 1–6.[46] Kaihua Zhang, Qingshan Liu, Yi Wu, and Ming-HsuanYang, “Robust visual tracking via convolutional net-works without training,”
IEEE Transactions on ImageProcessing , vol. 25, no. 4, pp. 1779–1792, 2016.[47] Jin Gao, Haibin Ling, Weiming Hu, and Junliang Xing,“Transfer learning based visual tracking with gaussianprocesses regression,” in
European Conference on Com-puter Vision . Springer, 2014, pp. 188–203.[48] Dong Wang and Huchuan Lu, “Visual tracking via prob-ability continuous outlier model,” in
Proceedings of theIEEE conference on computer vision and pattern recog-nition , 2014, pp. 3478–3485.[49] David Held, Sebastian Thrun, and Silvio Savarese,“Learning to track at 100 fps with deep regression net-works,” in
European Conference on Computer Vision .Springer, 2016, pp. 749–765.[50] Lijun Wang, Wanli Ouyang, Xiaogang Wang, andHuchuan Lu, “Visual tracking with fully convolutionalnetworks,” in
Proceedings of the IEEE internationalconference on computer vision , 2015, pp. 3119–3127.[51] Jifeng Ning, Jimei Yang, Shaojie Jiang, Lei Zhang, andMing-Hsuan Yang, “Object tracking via dual linearstructured svm and explicit feature map,” in
Proceed-ings of the IEEE conference on computer vision and pat-tern recognition , 2016, pp. 4266–4274.[52] Chao Ma, Jia-Bin Huang, Xiaokang Yang, and Ming-Hsuan Yang, “Hierarchical convolutional features forvisual tracking,” in