[PDF] Measuring the Temporal Behavior of Real-World Person Re-Identification

Abstract

Designing real-world person re-identification (re-id) systems requires attention to operational aspects not typically considered in academic research. Typically, the probe image or image sequence is matched to a gallery set with a fixed candidate list. On the other hand, in real-world applications of re-id, we would search for a person of interest in a gallery set that is continuously populated by new candidates over time. A key question of interest for the operator of such a system is: how long is a correct match to a probe likely to remain in a rank-k shortlist of candidates? In this paper, we propose to distill this information into what we call a Rank Persistence Curve (RPC), which unlike a conventional cumulative match characteristic (CMC) curve helps directly compare the temporal performance of different re-id algorithms. To carefully illustrate the concept, we collected a new multi-shot person re-id dataset called RPIfield. The RPIfield dataset is constructed using a network of 12 cameras with 112 explicitly time-stamped actor paths among about 4000 distractors. We then evaluate the temporal performance of different re-id algorithms using the proposed RPCs using single and pairwise camera videos from RPIfield, and discuss considerations for future research.

Full PDF

11 Measuring the Temporal Behavior of Real-WorldPerson Re-Identiﬁcation

Meng Zheng,

Student Member, IEEE,

Srikrishna Karanam,

Member, IEEE, and Richard J. Radke,

Senior Member, IEEE

Abstract —Designing real-world person re-identiﬁcation (re-id) systems requires attention to operational aspects not typicallyconsidered in academic research. Typically, the probe image or image sequence is matched to a gallery set with a ﬁxed candidate list.On the other hand, in real-world applications of re-id, we would search for a person of interest in a gallery set that is continuouslypopulated by new candidates over time. A key question of interest for the operator of such a system is: how long is a correct match to aprobe likely to remain in a rank-k shortlist of candidates? In this paper, we propose to distill this information into what we call a RankPersistence Curve (RPC), which unlike a conventional cumulative match characteristic (CMC) curve helps directly compare the temporal performance of different re-id algorithms. To carefully illustrate the concept, we collected a new multi-shot person re-iddataset called RPIﬁeld. The RPIﬁeld dataset is constructed using a network of 12 cameras with 112 explicitly time-stamped actor pathsamong about 4000 distractors. We then evaluate the temporal performance of different re-id algorithms using the proposed RPCs usingsingle and pairwise camera videos from RPIﬁeld, and discuss considerations for future research. (cid:70)

NTRODUCTION

Research in the area of automatic person re-identiﬁcation, orre-id, has exploded in the past ten years. The re-id problemis generally stated as: given images of a person of interest asseen in a “probe” camera view, how can we ﬁnd the sameperson among a set of candidate people seen in a “gallery”camera view? Re-id research to date typically falls into oneor more of the following categories: • Appearance modeling, in which the goal is to de-sign or learn feature representations invariant tochallenges like viewpoint and illumination variation(e.g., [1–5]). • Metric learning, in which the goal is to learn, in asupervised fashion, a distance metric that is used tosearch for the person of interest in the gallery set(e.g., [6–10]). • Multi-shot re-id, in which both the probe andgallery candidates are represented as short imagesequences/video clips instead of single frames (e.g.,[11–15]).We refer the reader to recent experimental and algorith-mic surveys by Karanam et al. [16] and Zheng et al. [17].While these issues are all critical for designing successful re-identiﬁcation algorithms, research in these areas generally • M. Zheng and R.J. Radke are with the Department of Electrical, Computer,and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY12180 USA (e-mail: [email protected], [email protected]). • S. Karanam is with Siemens Corporate Technology, Princeton, NJ 08540USA (e-mail: [email protected]). • Corresponding author: M. Zheng. This material is based upon worksupported by the U.S. Department of Homeland Security under AwardNumber 2013-ST-061-ED0001. The views and conclusions contained inthis document are those of the authors and should not be interpreted asnecessarily representing the ofﬁcial policies, either expressed or implied, ofthe U.S. Department of Homeland Security. oversimpliﬁes the problem that would face a real-worlduser of a deployed re-id system. In particular, the temporalaspect of the re-id problem is ignored in most academic re-id research. That is, in the real world, candidates would beconstantly added to the gallery as new subjects are auto-matically tracked, as opposed to presented to an algorithmall at once. Even if a correct match to the probe appearsin a rank-ordered shortlist shortly after they appear in agallery camera, this isn’t helpful to a user if the candidateis immediately pushed off the list after a few minutes bya new wave of incoming candidates. Figure 1 illustratesan example in which the person of interest ﬁrst appears at T = 2647 s at rank 2, but the rank drops to 6 at T = 8561 s asmore candidates are added to the gallery over time. To theuser, a natural question is, how long can a correct match beexpected to stay in the shortlist under typical circumstances?To this end, we take a broader view of how re-id algorithmsperform, judging them on this notion of persistence in timeand not just raw batch performance as typically presentedin a Cumulative Match Characteristic (CMC) curve. This isparticularly appealing in areas where re-id ﬁnds the mostimmediate and practical application, such as crime detectionor prevention. The perpetrator of a crime may re-appearafter a signiﬁcantly long time interval (e.g., several daysor even months), and it is important to understand howexisting re-id algorithms perform at this time scale.In this paper, we study several temporal aspects ofre-id and propose new evaluation methodologies that al-low different re-id algorithms to be compared based ona concept we call rank persistence . We discuss strategiesfor evaluating algorithms in circumstances when the sameperson of interest can appear multiple times in the galleryas well as when the performance on multiple persons ofinterest should be aggregated. The key concept we proposeis called the Rank Persistence Curve (RPC), which quantiﬁesthe percentage of candidates that remain at a certain rankfor a given duration. RPCs for different algorithms can be a r X i v : . [ c s . C V ] A ug Fig. 1: An example showing the temporal evolution of a gallery set captured in our RPIﬁeld dataset, and the correspondingtime-varying rank for a given query achieved by a re-id algorithm. The candidate arrives at t = 2647s at rank 2, and dropsin rank over time as more candidates arrive in the target camera.directly compared on the same camera gallery to allow auser to make informed choices about expected performancein real-world deployments.To help evaluate the temporal performance of re-id algo-rithms under the considerations discussed above, we needdatasets that come with explicitly time-stamped images.Current re-id benchmarking datasets such as VIPeR [18],iLIDSVID [19], and MARS [20] lack the kind of time stampsfor the gallery sets needed to understand the full potentialof the RPC concept, rendering them inappropriate for ouruse here. While it might be possible to repurpose themfor temporal re-id research, adding artiﬁcial time stampsseems suboptimal. To this end, we propose a new multi-shot multi-camera dataset, named RPIﬁeld, that includesmultiple reappearances of 112 “actors” walking along spec-iﬁed paths, among almost 4000 distractor pedestrians. Weillustrate RPCs with examples and experiments drawn fromour dataset and then compare the temporal performanceof several re-id algorithms. We conclude the paper withfurther discussion of the temporal implications of re-id, andsuggestions for future research in this area. The RPIﬁelddataset and RPC code will be made publicly available uponacceptance of this paper. A preliminary version of this workappeared in Karanam et al. [21].

ELATED W ORK

Re-id is a widely studied ﬁeld; we refer the reader to therecent experimental survey by Karanam et al. [16] and thealgorithmic survey by Zheng et al. [17] for an extensivediscussion. Here we only discuss work directly related tothe problems we address in this paper- temporally richdatasets and performance evaluation. Existing re-id algo-rithms are typically evaluated on academic re-id datasetssuch as VIPeR [18], Market1501 [22], CAVIAR [23], PRID[24], 3DPeS [25], WARD [26], iLIDSVID [19] and RAiD [27].These datasets are speciﬁcally hand-curated to only havesets of bounding boxes for the probes and the correspondingmatching candidates in the gallery. On the other hand, re-id is typically only a small module in a larger end-to-end surveillance system that tracks people in camera networks[28]. Consequently, it is important to construct datasetsthat help evaluate this critical module from a real-worldoperational perspective, and as described in the previoussection, the temporal aspect is at the forefront. In the past,there have been some efforts to this end.Figueira et al. proposed the HDA+ [29] dataset as atestbed for evaluating an automatic re-id system. Fullylabeled frames for 30-minute long videos from 13 disjointcameras were provided as part of this dataset. Ristani et al. presented a much larger-scale multi-camera dataset,called DukeMTMC4ReID [30], where images correspondingto 1852 unique people were captured from a disjoint 8-camera network. While these, and other relevant datasetsdescribed further in Karanam et al. [16], are multi-shotand multi-camera by design, they lack the crucial temporalaspect we describe in this work. This is a key differencebetween the proposed RPIﬁeld dataset and others. Speciﬁ-cally, RPIﬁeld comes with explicit time-stamp informationabout when a particular person re-appeared. Most peoplecaptured in our dataset have multiple re-appearances, andeach re-appearance is time-stamped.A very popular evaluation metric in the ﬁeld of re-id isthe cumulative match characteristic (CMC) curve, which is aplot of the re-id rate at rank-k [16]. However, the CMC curvecompletely disregards the time at which a particular personappeared in the gallery set, instead using the entire galleryset as a batch to compute the re-id performance. Some recentwork also uses the mean average precision (mAP) [22] toevaluate algorithms on multi-shot datasets with multipleground-truth correct matches of the person of interest in thegallery. However, mAP has the same fundamental problemas the CMC; the assumption is that the entire gallery set isavailable at the same time, which is not how a gallery setwould be structured in the real world. To bridge this gapbetween academic re-id evaluation and usability for real-world systems, we present the concept of rank persistence,where we explicitly ask the question: how long will a personof interest stay in the top-k retrieved list as more and morecandidates are populated in the gallery over time? HE RPI

FIELD D ATASET

As discussed in Section 1, time-stamped datasets are criticalfor evaluating real-world re-id systems that detect and trackpeople automatically. Here, in order to simulate such real-world requirements, we present a new multi-shot multi-camera re-id dataset, called RPIﬁeld.RPIﬁeld is constructed from 12 synchronized camerasplaced around an outdoor ﬁeld on the campus of RensselaerPolytechnic Institute. Figure 2 shows the camera networklayout. 6 poles were positioned around the ﬁeld, one eachat points A through F. Each red arrow represents a camerawith its corresponding viewing direction.Fig. 2: Overhead view of the RPIﬁeld camera locations andorientations, superimposed on a map of the RPI ’86 Field.The total length of videos from the collected camerafeeds is about 30 hours, with each individual video beingabout 150 minutes in duration. The videos were recordedat full high-deﬁnition (1080p) resolution, with each cam-era’s frame rate being 30fps. We collected all data between11:30 AM to 2 PM on a weekday to capture the typicallyheavy trafﬁc associated with this particular time period.This was also important to capture a large number ofdistractors to simulate a realistic re-id scenario. Camerasthat have opposite viewing directions along the same path(e.g., camera pairs 1 and 8, or 5 and 6) have some overlap intheir camera views. All other cases involve non-overlappingviews. The images captured comprise an eclectic mix ofchallenges including viewpoint variations, intra- and inter-camera illumination variations, and occlusions.In our dataset, there are 112 known participants (actors)in total, with each showing up in at least 3 camera views.The participants were asked to walk along speciﬁc, pre-deﬁned paths (provided by us) between different points (Athrough F) around the ﬁeld shown in Figure 2. In order toensure re-appearances of the actors in all camera views, eachassigned path for a participant contains at least 3 differentpoints. For example, if a participant is walking along thepath A → B → C → F → B → A , s/he will appear incamera views 2, 3, 5, 6, 7, 11, 4, 3 and 2. Each participant wasassigned a different walking path. During this process, wealso captured images of all other pedestrians (distractors)walking along all the paths, which form the candidate set ineach of the 12 galleries. An illustrative example of multiplere-appearances of a certain participant in multiple cameraviews is shown in Figure 3. Each subﬁgure in Figure 3 shows the appearance of the participant in a different camera at adifferent time.To automatically collect image sequences for each per-son, we used an off-the-shelf person detector, based onthe aggregated channel features (ACF) algorithm of Dollaret al. [31, 32] to crop individual images in each frame.After collecting all the bounding boxes of all the detectedpeople (including participants and distractors), we ﬁrst usethe intersection over union (IoU) of the bounding boxesto get a collection of raw tracklets. Speciﬁcally, for eachcropped bounding box, we calculate the IoU of the currentbounding box with all bounding boxes collected in theprevious 90 frames, with a threshold of 0.4. We then matchthe current bounding box with the identity of the frameimage achieving the largest IoU value. For bounding boxeswith no surviving IoUs, we assign a new label. Since therewill be false detections and broken tracklets (e.g., multipletracklets for the same person) due to illumination variationsand occlusions, we manually corrected all these errors toensure each unique image sequence corresponds to oneappearance of a person. The time stamp information foreach image is preserved through the corresponding framenumber. Sample images of Participant 1 in multiple cameraat different times are shown to the right of each subﬁgure inFigure 3.A statistical summary of RPIﬁeld is shown in Table 1,where Cam No.

In Table 2, we compare RPIﬁeld with existing benchmarkmulti-shot and multi-camera re-id datasets, noting that thenew dataset has the largest number of cameras. In the rightcolumn, we summarize the attributes of each dataset asmulti-shot (MS), or multi-camera (MC). We emphasize thatthe probes in our dataset correspond to known actors , whowere provided speciﬁc walking and re-appearance instruc-tions to aid in the kind of temporal research we discussin this work. To our knowledge, other re-id datasets were (a) Cam ID: 3, T=668.1s (b) Cam ID: 2, T=738.1s (c) Cam ID: 1, T=767.7s(d) Cam ID: 4, T=1133.1s (e) Cam ID: 5, T=1330.9s (f) Cam ID: 6, T=1343.8s(g) Cam ID: 7, T=1365.7s (h) Cam ID: 10, T=1441.0s (i) Cam ID: 8, T=1523.7s

Fig. 3: Illustration of the reappearances of Participant 1 in different camera views. T is the time for each reappearance.Figures (a) to (i) are placed in time order of the appearance of Participant 1. The right column of each subﬁgure showssample automatically extracted bounding boxes used as input to the re-id algorithms.not planned in this way. While most of the multi-cameraor multi-shot re-id datasets have no distractors (exceptDukeMTMC4ReID [30], Market-1501 [22] and MARS [20]),our dataset preserves all image sequences of all detecteddistractors. Most importantly, actors could reappear in oneor multiple cameras multiple times in our dataset, whichallows us to study re-id algorithms in a more general waythan usual.TABLE 2: Comparison of existing multi-shot, multi-camerare-id datasets.

Dataset

RPIﬁeld

DukeMTMC4ReID [30] 4,6261 1,852 21,551 8 MS, MCMarket-1501 [22] 2,668 1,501 2,793 6 MS, MCMARS [20] 1,067,516 1,261 3,248 6 MS, MCSAVIT-Softbio [33] 64,472 152 0 8 MS,MCPRID [24] 2,4541 178 0 2 MSiLIDSVID [19] 42,495 300 0 2 MS3DPeS [25] 1,011 192 0 8 MCWARD [26] 4,786 70 0 3 MCCUHK02 [34] 7,264 1816 0 10 MCCUHK03 [35] 13,164 1360 0 10 MCRAiD [27] 6,920 43 0 4 MCHDA+ [29] 2,976 74 0 12 MC

ANK P ERSISTENCE

In this section, we present the concept of rank persistence insteps, working up to evaluating situations in which multiple persons of interest each reappear one or multiple timeswithin a video. We introduce the

Rank Persistence Curve(RPC) , which encapsulates the behavior of correct matchesin a temporally evolving and increasing gallery.For the purposes of illustration, all plots presented in thissection are generated by the unsupervised GOG algorithm[36] to describe person images, followed by Euclidean dis-tance to rank candidates. We denote this by “GOG+ l ” in theplots. These preliminary plots are meant only to motivatethe RPC idea, not to propose a new or deﬁnitive re-idalgorithm. Further implementation details and performanceevaluations are provided in Section 5. We ﬁrst consider the case of a single probe/query thathas exactly one re-appearance over the course of a videosequence. As noted previously, different from existing re-idevaluation schemes where the probe is matched to candi-dates from a ﬁxed gallery, we consider a temporally evolv-ing gallery set whose dynamically varying size depends onthe ﬂow of people over time. In other words, the numberof candidates is continuously increasing as more peopleappear in the video sequence. In this scenario, supposethe person of interest ﬁrst reappears at time T at rank k computed by a certain re-id algorithm. Let N T be the size ofthe gallery at this time (i.e., N T is the total number of peopleseen in the gallery camera from time t = 0 to t = T ). Given these considerations, the rank of the person of interest canonly increase as the gallery size increases after time t = T .This notion of instantaneous rank of the person of interest asa function of time in an ever-increasing gallery can be easilycaptured, as illustrated in Figure 4.In each example of Figure 4, the horizontal axis is realtime (seconds from the beginning of the video sequence)and the vertical axis is the instantaneous rank of the personof interest. Each temporal rank curve does not start from T = 0 s , since the person of interest generally re-appearsmidway through the video. In this example, each plot endsat T = 9840 s , which is the duration of the entire videosequence. For example, in Figure 4(c), the person of interestﬁrst re-appears at rank k = 42 , then drops constantly from T = 1939 s to the end of the video. This rate of drop inrank basically reﬂects the robustness of the re-id algorithmto the temporal aspect of matching a probe candidate to anever-expanding gallery.Figure 4 compares these temporal rank curves for threedifferent probes with respect to the same gallery, whereeach of the probes re-appears only once in the entire video.For the ﬁrst probe, the rank persists at a relatively lowvalue from the time of its ﬁrst re-appearance, mostly dueto strong appearance similarity between the re-appearanceimage and the original probe image. In the second case,the re-appearance image is slightly different from the probeimage, which results in a steeper drop in the instantaneousrank. Finally, in the third case, the difference between the re-appearance and the probe image is quite obvious, resultingin a more dramatic drop in the instantaneous rank.One aspect that has a direct impact on these temporalrank curves is the video ﬂow density, deﬁned as the instan-taneous number of people seen by the gallery camera perunit time. Speciﬁcally, if N people appear in the time range t = T to t = T + t in the video sequence, we calculate thevideo ﬂow density V for this range as: V ( T : T + t ) = Nt (1)To understand how this might impact rank persistence,consider two non-overlapping time blocks with the sameduration t . In the ﬁrst block, the gallery camera sees x people walking by. In the second block, the gallery camerasees y people walking by. If x (cid:29) y , we add many morepeople to the gallery in the ﬁrst time block, likely resultingin a steeper drop in the probe’s rank compared to the secondtime block.We considered two ways to visualize and understandthe inﬂuence of video ﬂow density on our temporal rankcurves, shown in Figures 5(b)-(c). Figure 5(b) shows thetemporal rank curve with a color bar at the bottom indi-cating the instantaneous video ﬂow density (the unit time t in 1 is chosen as seconds). The purple color in thecolor bar indicates a large video ﬂow density, whereas blueindicates a lower video ﬂow density. In line with whatwe discussed above, the high video ﬂow densities at times t ≈ s, s cause steep drops in the rank of the probe.Figure 5(c) shows an alternate way of visualizing thisaspect; the rank of the probe is plotted against the numberof gallery candidates. Here, the person of interest ﬁrst re-appears at rank 24 when the number of candidates in the gallery is 155. Subsequently, the rank curve drops asthe number of candidates in the gallery increases. Thoughvideo ﬂow density has a direct inﬂuence on the per-probetemporal rank curves shown in Figure 4, its deﬁnition isbased on absolute real time. Since the arrival time of eachprobe belongs to a different time block, each per-probe plotgoes through different time periods with various video ﬂowdensities. However, the representations in Figures 5(b) and(c) would pose signiﬁcant difﬁculty when we integrate thesecurves across multiple probes in Section 4.3. Thus, goingforward we only consider real time as the horizontal axisfor temporal rank curves. Generally, a person of interest might re-appear multipletimes during the entire course of a video (i.e., separatereappearances spaced apart in time, not different samplesfrom a tracklet as in multishot re-id) . To plot a temporalrank curve as in Section 4.1, we modify the vertical axis toshow the lowest instantaneous rank of any re-appearance ofa probe.As in the ﬁrst case, the instantaneous rank of the probecan only increase after the ﬁrst re-appearance as morecandidates are added to the gallery. However, the rankof the second re-appearance might achieve a lower valuethan the ﬁrst re-appearance depending on its similarity tothe probe image. After the second re-appearance, the rankcurve will again be non-decreasing until we hit the thirdre-appearance, and so on. Thus, the temporal rank curve ispiecewise monotonically non-decreasing.Figure 6 shows temporal rank curves for two differentprobes, each of which re-appears multiple times in the samevideo. For the ﬁrst probe in Figure 6, the curve starts from alarge rank (154) at the time of her ﬁrst re-appearance, sincethe front and back views differ substantially from each other.At the time of her second re-appearance ( T = 6994 s ), therank decreases to 1 since the second re-appearance is verysimilar to the probe image. This low rank dominates thethird re-appearance of the probe. For the second example,the rank curve starts at rank 30 and keeps increasing afterthe ﬁrst reappearance. The second reappearance of the probehas no obvious inﬂuence on the overall rank curve, despiteits higher similarity to the probe image. Finally we arrive at the most general situation, in whichwe evaluate the performance of a given re-id algorithmacross multiple probes, each of which may reappear oneor multiple times over the entire course of the video. Inthis case, we deﬁne the

Rank Persistence Curve based onthe per-probe temporal rank curves introduced in Section4.1 and 4.2. The RPC as presented here can be used toevaluate the performance of algorithms on entire datasetsthat have data structured and time-stamped as describedin this work. In the following, we also present a theoreticaland empirical comparison with the traditionally used CMCcurve to highlight the differences and advantages of theproposed RPC.For each probe that has at least one re-appearance inthe video sequence, we plot the temporal rank curve as (a) (b) (c)Fig. 4: A comparison of temporal rank curves for 3 different probes that each reappear only once. For each case, the topimage is a sample image selected from the probe image sequence, and the bottom image is a sample selected from theknown reappearance of the probe in the gallery.(a) (b) (c)Fig. 5: The rank of a person of interest changes in an ever-expanding gallery. (a) illustrates our usual method for creatingtemporal rank curves. (b) is an alternate visualization in which a colorbar indicates the instantaneous video ﬂow density.(c) is an alternate visualization in which the horizontal axis indicates the number of candidates, not the number of seconds.Fig. 6: A comparison of temporal rank curves for 2 different probes that reappear multiple times. For each case, the topimage is a sample image selected from the probe image sequence, and the bottom images are samples selected from theseparate reappearances of the probe in the gallery. Each red ‘ × ’ indicates one reappearance of the probe.described in Sections 4.1 and 4.2. We then deﬁne the RankPersistence Curve (RPC) as follows: The dependent axisof the RPC is set as the duration d in real units (e.g.,seconds). For a ﬁxed rank r and each duration d , we plot thepercentage of probes that appear continuously at or belowrank r for at least d units. Based on the deﬁnition, the RPCfor a ﬁxed rank r is monotonically decreasing, and RPCs at higher ranks dominate those at lower ranks.Figure 7(a) shows the RPC across a dataset of 46 probesin the view of Camera 1 of RPIﬁeld. In the plot, we showRPCs for ﬁve different ranks r ∈ { , , , , } . In con-trast, the traditional CMC for the same set of 46 probes andusing the same algorithm is shown in Figure 7(b). We cansee that the two types of curves are qualitatively different. (a) (b)Fig. 7: Comparison of (a) the proposed Rank Persistence Curve (RPC) and (b) the traditional Cumulative MatchCharacteristic (CMC) curve for Camera 1 of RPIﬁeld, containing 46 probes.Since we want to capture the temporal aspect of rank in theRPC, the dependent axis is no longer rank but duration, andwe need a third “axis” (in this case color) to indicate rank.To understand the RPC, let us consider the RPC for r = 1 , shown in blue. This captures our objective ofvisualizing how likely and for how long a candidate is tostay at rank 1 across a long video sequence. The RPC startsat 57% and stays there for 241 seconds, meaning 57% of theprobes had a re-appearance at rank 1 that persisted for 241seconds. The RPC is a monotonically non-increasing curvewith respect to the duration, which is intuitive since thelowest instantaneous rank of any re-appearance of a probecan either stay same or increase as we add more candidatesover time to the gallery. Of course, rank 1 is a stringentrequirement, and candidates are more likely to persist forlonger durations at higher ranks. This is evident from theRPCs for r > in the plot, where we see much higherpercentages of probes that stay at r > for a given duration d . As noted above, the RPC is both qualitatively and mea-surably different from a CMC curve. Unlike the CMC, theRPC provides valuable information on how long correctmatches to the probes persist within the rank-k shortlist ina time-varying gallery. These considerations are importantboth in terms of the length of the shortlist (real-world endusers, typically not computer vision experts, would notwant to scroll through pages and pages of candidates toﬁnd the person of interest) and the duration of persistence(in real-world scenarios, end users may only get around tochecking the output of a re-id surveillance system a fewtimes an hour). With this theory we can now comparedifferent re-id algorithms and analyze performance moredeeply. XPERIMENTS AND R ESULTS ON

RPI

FIELD

In this section, based on our discussion in Section 4, wequantify the temporal performance of several re-id algo-rithms using RPCs. We ﬁrst present single-camera andpairwise-camera results on the RPIﬁeld dataset using a speciﬁc re-id algorithm. Subsequently, we compare the per-formance of various re-id algorithms to demonstrate thesuitability of RPCs to evaluate and contrast their temporalperformance. Again, our goal in this paper is not to promotea particular re-id algorithm, but a method for algorithmperformance evaluation.We ﬁrst brieﬂy describe the basic re-id algorithm for ourinitial experiments. We use the recently proposed Gaussianof Gaussian (GOG) descriptor [36] to extract features fromperson images. Given an image sequence containing n images I , I , ..., I n for each person, we compute a singlefeature vector describing the sequence as the average of the n feature vectors f , f , ..., f n computed with GOG.We present re-id results using the Euclidean ( l ) distance,which we refer to as the “baseline” algorithm, as well asthe XQDA [7] distance metric. To train the XQDA distancemetric, we randomly select 20 identities from the 112 inRPIﬁeld and use image sequences from all the 12 cameraviews. To form positive training pairs, we take all possiblecombinations of the tracks of the same person from allcamera views. For instance, if a participant appears 3 timesin Camera 1, 2 times in Camera 2 and 3 times in Camera 6,we will form 3 × × × t = T , we consider all candidates that ever appearedbetween t = 0 and t = T as the gallery with which tocompare the probe. This assumes at least one re-appearanceof the probe in this time duration, which is known fromground truth. In the results that follow, we present the per-formance of GOG+XQDA (GOG as the feature and XQDAas the distance metric) and compare it with the baselineGOG+ l (GOG as the feature and l as the distance metric). We begin by performing experiments on single-cameravideos from RPIﬁeld, where we use the two algorithmsdiscussed above to analyze temporal characteristics of boththe dataset and the algorithms.

In this section, we present single-probe evaluation resultson two best performing camera views (1 and 4). Figure 8(a)-(b) shows the temporal rank curves for two participantswho each re-appear once in the view of Camera 1. For eachrank curve in Figure 8, we compare the instantaneous rankof GOG+XQDA with the baseline GOG+ l . Figure 8(c)-(d)gives similar plots for participants who re-appear multipletimes in Camera 4. Some reappearances of the probes, forinstance the second reappearance of Participant 98 (Figure8(d)), cause abrupt decreases in the instantaneous rank, dueto their strong similarity to the probe image.The varying patterns of the temporal rank curves inFigure 8 shed light on several characteristics of the probeand gallery images. For instance, the low drop rate ofGOG+XQDA indicates relative temporal robustness of there-id algorithm, i.e., robustness of the matching result toa temporally evolving and expanding gallery, while the l metric performs relatively poorly. More generally, multiple probes need to be considered forre-id performance evaluation. As discussed in Section 4.3,we can incorporate per-probe temporal rank curves intoRank Persistence Curves for single camera re-id, as shownin Figure 9. Each RPC represents a particular algorithm’sperformance on the dataset comprised of the probes consid-ered as part of evaluation.Figures 9(a) and (b) show temporal evaluation results forvideo from Cameras 1 and 4, while Figures 9(c) and (d) showthe corresponding CMC curves. For each camera, we aggre-gate performance over all participants who reappeared atleast once during the course of the video. These curves helpillustrate important insights from the RPC that the CMCcurve ignores. For instance, consider the rank-10 RPC ofGOG+XQDA (dotted blue) in Figure 9(a). The RPC tells us that the performance drops from 95% to 3% across thethe entire duration of the video. This means only 3% of theprobes in our dataset “persist” at rank 10 when we considerthe duration of the entire video. On the other hand, the CMCcurve for GOG+XQDA in Figure 9(c) only tells us that therank-10 performance was 90%, meaning 90% of the probeswere re-identiﬁed within rank-10 in a gallery of candidatesall considered at the same time; all temporal information islost. The RPCs give us more detailed temporal performanceevaluation while being easy to present and interpret.Comparing RPCs of the same color and linestyle inFigure 9(a) and (b), we can see that both the GOG+XQDAand baseline algorithms generally perform better in Camera1 than in Camera 4. As illustrated in Figure 10, this is likelycaused by a higher fraction of difﬁcult examples (e.g., par-ticipants wearing backpacks), along with a higher numberof distractors and more shadow/illumination issues.

In Section 5.1, we evaluated single camera videos fromRPIﬁeld using RPCs. In more realistic situations, we need toconsider the case that probe and gallery candidates are fromdifferent camera views. In cross-view re-id, illuminationconditions and viewpoint variations will signiﬁcantly inﬂu-ence the performance of re-id algorithms, which will alsobe reﬂected in our temporal evaluation. In this section, weevaluate and compare the performance of re-id algorithmsusing RPCs for different camera pairs.

For a person of interest who appears in two different cameraviews at different times, we consider the image sequence ofthe person of interest in one of the camera views as theprobe and plot the temporal rank curve in the other cameraview’s time-varying gallery, an example of which is shownin Figure 11. Similar to the results in Section 5.1.1, these per-probe rank curves show different patterns. Focusing on theGOG+XQDA (blue) curve in Figures 11(a) and (c), the rankis at a fairly low value due to the strong similarity betweenthe feature vectors of the probe in the projected featurespace. In Figures 11(b) and (d), the gallery reappearance/s (a) Camera 1, Person 5 (b) Camera 1, Person 69 (c) Camera 4, Person 97 (d) Camera 4, Person 98

Fig. 8: Comparison of temporal rank curves of different probes from Cameras 1 and 4. GOG+XQDA (blue curve) uses GOGfor feature extraction and XQDA for metric learning; GOG+ l (red curve) uses GOG as feature extraction and l distancefor ranking (baseline). Panels (a)-(b) show temporal rank curves for participants who reappear once in Camera 1. Panels(c)-(d) show rank curves for participants who reappear multiple times in Camera 4. (a) RPC for Camera 1 (b) RPC for Camera 4(c) CMC curve for Camera 1 (d) CMC curve for Camera 4 Fig. 9: (a)-(b) RPCs and (c)-(d) CMC curves for Cameras 1 and 4. “ r = 1 and .Fig. 10: Four example image pairs from Cameras 1 and 4. For each pair, the left is the probe image, and the right is thereappearance image. (a) Person 1 (b) Person 110 (c) Person 10 (d) Person 55 Fig. 11: Comparison of temporal rank curves of different probes for camera pair (1, 2). Probes are taken from Camera 1,candidates are from Camera 2. Panels (a)-(b) show rank curves for participants who reappear once in Camera 2; Panels(c)-(d) show rank curves for participants who reappear multiple times in Camera 2. (a) Rank Persistence Curves (b) Cumulative Match Characteristic Curves Fig. 12: RPCs and CMC curves of multiple probes for camera pair (1, 2). For RPCs, GOG+XQDA is compared to the baselinealgorithm with ﬁxed ranks r = 1 and . “ We now move to the general case of multiple probes forpairwise re-id evaluation. Using the per-probe rank curvesfrom above, we plot the RPCs for this experiment, shownin Figure 12. For the RPC in Figure 12(a), we consider allparticipants (63 in total) who appear in Cameras 1 and2. The corresponding CMC curves for this experiment areshown in Figure 12(b). When compared to the RPCs inFigure 9 for single camera views, we can see a larger per-formance difference between GOG+XQDA (blue lines) andthe baseline (red lines) algorithm in Figure 12 for pairwisere-id. This indicates that XQDA learning improves cross-view re-id more signiﬁcantly than single view re-id. Wenotice a similar trend between the CMC curves in Figure12(b) and Figures 9(c) and (d). From the rank-1 RPCs inFigure 12(a), however, we observe the percentage differencebetween GOG+XQDA and the baseline curve drops from35% to 0% when we consider the entire duration of thevideo, while the CMC curves only tell us that the rank-1 performance difference between GOG+XQDA and thebaseline is 30% when candidates in the gallery are allconsidered at the same time. From the RPCs, we know thateven though GOG+XQDA performs much better than thebaseline algorithm within the short duration after a probe’sreappearance, this performance advantage is continuouslydecreasing as more candidates are added to the gallery. Thisinformation would be important to users of real-world re-idsystems when comparing various re-id algorithms.When evaluating the temporal performance of al-gorithms across multiple probes for cross-view re-identiﬁcation, the same algorithm may produce differentperformance if the probe and gallery camera are inter-changed. For different camera pairs, performance will alsovary due to cross-view differences. In Figure 13, we presentRPCs for 2 camera pairs. In plotting RPCs in Figure 13,we consider all participants who ever appeared in bothcameras of the camera pair. For Figure 13(a), the probes aretaken from Camera 1, whereas the candidates are taken from Camera 5, and vice versa for Figure 13(b). Figure 13(c) and(d) are plotted in a similar way for Cameras 1 and 3.First, let’s focus on the RPCs for the same camera pair.Consider the camera pair (1, 5) for illustration purposes.While having exactly the same number of participants inFigure 13(a) and (b), the same re-id algorithm produces dif-ferent performance in two different time-varying galleries.From Table 1, we can see a larger number of distractorsin the gallery set of Camera 5, which will likely degradethe performance shown in Figure 13(a) compared to (b).Furthermore, for the camera pair (1, 3), the graphs withthe same color and linestyle in Figure 13(d) are also higherthan (c), indicating higher performance when candidates arefrom the gallery of Camera 1. This indicates that a personof interest will more likely persist in a rank-1 or rank-10shortlist for a longer time in Camera 1 when compared toCamera 3 or 5. We see that the RPC can thus also be usedto derive conclusions regarding which camera views in acamera network are more likely to “hold on” to the personof interest, an aspect of crucial importance in sensitivesurveillance applications.More signiﬁcantly, we can see an obvious performancedifference in Figures 13(b) and (d) when the same gallery setis considered. One cause for the difference is that differentsets of probe images are considered in Figure 13(b) and(d). Another reason is the viewpoint variations across thecamera views of these camera pairs. As we discussed atthe beginning of Section 5, all positive pairs from differentcamera views are trained for XQDA learning simultane-ously. Due to different illumination conditions or viewpointchanges, some camera pairs could be easier for the trainedmetric, reﬂected in better cross-view re-id performance. Dueto the unique location of Camera 5, there are substantiallymore shadows and illumination variation patterns com-pared to Cameras 1 and 3, resulting in the performancedegradation shown in Figure 13(b). All these factors result inworse temporal performance of the re-id algorithm, which isreﬂected in the lower value of the RPCs for a ﬁxed duration,shown in Figure 13(b). (a) Probe camera 1, Gallery camera 5 (b) Probe camera 5, Gallery camera 1(c) Probe camera 1, Gallery camera 3 (d) Probe camera 3, Gallery camera 1 Fig. 13: RPCs of multiple probes for camera pairs (1, 5) and (1, 3). GOG+XQDA is compared to the baseline with ﬁxedranks r = 1 and . ‘ With the proposed RPCs, we can now easily compare thetemporal performance of competing re-id algorithms. Inthis section, we will compare the RPCs of different re-idalgorithms for cross-view re-id on a speciﬁc camera pair.Speciﬁcally, we choose camera pair (10, 12) in our dataset.Additional results on other pairs are provided as part ofthe supplementary material. There are 50 participants whoappear in both camera views, of which we randomly pick25 for metric learning.For feature extraction, we consider GOG [36], LOMO [7],WHOS [13], HistLBP [37], IDE-VGGNet [38], IDE-ResNet[39], and IDE-CaffeNet [40]. In IDECaffeNet, IDE-ResNet,and IDE-VGGNet, we use the idea presented in [17] byZheng et al., in which every person is treated as a separateclass and a convolutional neural network is trained witha classiﬁcation objective. AlexNet [40], ResNet [39], andVGGNet [38] architectures are employed in IDE-CaffeNet,IDE-ResNet and IDE-VGGNet respectively. In each case, westart with a model pre-trained on the ImageNet dataset, andﬁne-tune it using training data from 14 existing benchmarkdatasets, as described more fully in Karanam et al. [16].For metric learning, we consider KISSME [6], XQDA [7],NFST (linear kernel) [9] and kLFDA (linear kernel) [8, 37],algorithms that were shown to perform well in the studyof Karanam et al. [16]. The RPCs for rank r = 1 of variouscombinations of these re-id algorithms are shown in Figure 14 (a-c). The corresponding CMC curves are plotted inFigure 14 (d-f).In Figure 14(a), for a ﬁxed metric learning method,kLFDA, we can easily compare the temporal performanceof various feature extraction methods. We note that IDE-ResNet achieves the best performance with IDE-CaffeNet,GOG and IDE-VGGNet not too far behind, with all thesemethods performing better than HistLBP. In Figure 14(b),for a given feature extraction method, GOG, we comparethe temporal performance of competing metric learningmethods. While KISSME produce better results for short du-rations, kLFDA and NFST (linear kernel) generate better re-sults at longer durations, with all of them performing muchbetter than the baseline Euclidean distance ( l ) method.Figure 14(c) gives a general comparison of re-id algorithmswith different combinations of feature extraction and metriclearning methods. For illustration purposes, we only selectthe top few. We can see that IDE-Resnet+KISSME, IDE-VGGnet+KISSME, and GOG+kLFDA achieve comparableresults at mid-to-long durations, with IDE-Resnet+KISSMEperforming better for shorter durations. As we illustrated inprevious sections, CMC curves in Figure 14(d-f) can directlycompare multiple algorithms as well, but are missing anyoperationally useful temporal comparison. (a) (b) (c)(d) (e) (f)Fig. 14: Comparing rank 1 RPCs and CMCs of different re-id algorithm combinations applied to camera pair (10, 12).Camera 10 is the probe camera and Camera 12 is the gallery camera. (a) Rank-1 RPCs for kLFDA metric learning withdifferent feature extraction methods. (b) Rank-1 RPCs for GOG feature extraction with different metric learning methods.(c) Rank-1 RPCs for well-performing feature/metric combinations. (d)-(f) CMC curves corresponding to the experimentsin (a)-(c). ISCUSSION AND F UTURE W ORK

Now that the underlying computer vision and machinelearning technologies for re-id have matured, we proposethat re-id researchers should begin to take a broader viewof evaluating how re-id algorithms should integrate intofunctional real-world systems. For example, as noted in Li etal. [41] and Camps et al. [28], the problem of comparing onecandidate rectangle of pixels to another is only a small partof a fully automated re-id system. Instead, we must take intoaccount that the candidate rectangles are likely generatedby an automatic (and possibly inaccurate) human detectionand tracking subsystem, and that the overall system needsto operate in real time. Instead of benchmarking datasets inwhich the gallery images are acquired only a few momentsafter the probe images, we should consider applicationssuch as crime prevention in which a perpetrator may returnto the scene of the crime days after their initial detection. Insuch cases, the gallery of candidates is ever-expanding, andfor certain periods of time may not contain the person ofinterest at all.In this paper, we take a step towards this objective.To simulate a fully automated re-id system with humandetection and tracking functionality, we collected a newmulti-camera dataset with all time stamps preserved andall person images generated by the ACF human detector.The dataset contains multiple planned reappearances ofparticipants and many distractors in order to simulate a real-world re-identiﬁcation application such as crime prevention. Temporal analysis is then applied to this dataset to help usevaluate the performance of multiple re-id algorithms func-tioning in a simulated re-id system. While our experimentsshow interesting results, they also pose several challengeswe wish to tackle in the future: • As we stated, in real-world re-id applications suchas crime prevention, a perpetrator may return tothe crime scene after a long period of time, e.g., aday or more, during which the gallery would beﬁlled with a huge amount of distractors. This posesa signiﬁcant challenge to re-id algorithms in terms ofhow to correctly match the same person to a hugelist of candidates. In our current dataset, however,the time spread of participants’ reappearances isrelatively short, typically within 2 hours, which isan approximate simulation of such real situations.Extremely long duration data collections with knowntime-stamped reappearances would be very useful. • A criminal might naturally change his or her appear-ance when returning to the same scene, for examplewearing a hat or mask, which will also aggravate thedifﬁculty of re-identiﬁcation. In our dataset, someof the participants took off/carried their backpackswhen entering different camera views, but moretypes of appearance change would be preferred infuture studies. • Based on our proposed Rank Persistence Curve, theperformance of competing re-id algorithms for a single camera pair can be visualized through an easy-to-read graph. More generally, we need to considerhow to evaluate re-id algorithm performance for anentire multi-camera network. This raises interestingbut challenging questions such as how to considermultiple galleries from different camera views at thesame time, and how to construct a consistent evalu-ation scheme to integrate all these different galleries. • Temporal considerations, in terms of both the ever-increasing gallery size and multiple probe appear-ances, lead to natural challenges from a feature andmetric learning point of view. These can potentiallybe addressed by using temporally incremental ap-proaches to learning re-id models [42, 43] where themodel can be temporally adapted over time usingeither automated or human-in-the-loop feedback. • In our experience with integrating academic re-idalgorithms into operational surveillance commandcenters, we found the issue of user interfaces tobe extremely important. The similarity between arank-k shortlist and a police lineup was an effectiveanalogy. However, the potentially very long timescales for crime prevention applications requires there-evaluation of an operationally meaningful short-list. Should candidates “age out” of the ranked listusing some sort of forgetting factor or re-rankingscheme [44, 45]? Should extremely promising can-didates from long ago be kept alongside less-certainbut more timely recent candidates? Should the time-varying gallery contain all candidates ever seen oronly those from the last N minutes? These consid-erations require close consultation with the potentialusers of the system to understand and set expecta-tions and corresponding interface choices. R EFERENCES [1] L. Bazzani, M. Cristani, and V. Murino, “Symmetry-drivenaccumulation of local features for human characterizationand re-identiﬁcation.”

Computer Vision and Image Under-standing , vol. Volume 117 Issue 2, February, 2013.[2] N. McLaughlin, J. M. D. Rincon, and P. Miller, “Recur-rent convolutional network for video-based person re-identiﬁcation,” in

CVPR , 2016.[3] R. Satta, “Appearance Descriptors for Person Re-identiﬁcation: a Comprehensive Review,”

ArXiv e-prints ,2013.[4] D. Li, X. Chen, Z. Zhang, and K. Huang, “Learning deepcontext-aware features over body and latent parts forperson re-identiﬁcation,” in

CVPR , 2017.[5] H. Zhao, M. Tian, S. Sun, J. Shao, J. Yan, S. Yi, X. Wang, andX. Tang, “Spindle net: Person re-identiﬁcation with humanbody region guided feature decomposition and fusion,” in

CVPR , 2017.[6] M. Kostinger, M. Hirzer, P. Wohlhart, P. M. Roth, andH. Bischof, “Large scale metric learning from equivalenceconstraints,” in

CVPR , 2012.[7] S. Liao, Y. Hu, X. Zhu, and S. Z. Li, “Person re-identiﬁcation by local maximal occurrence representationand metric learning,” in

CVPR , 2015.[8] S. Pedagadi, J. Orwell, S. Velastin, and B. Boghossian,“Local ﬁsher discriminant analysis for pedestrian re-identiﬁcation,” in

CVPR , 2013. [9] L. Zhang, T. Xiang, and S. Gong, “Learning a discrimi-native null space for person re-identiﬁcation,” in

CVPR ,2016.[10] W. Chen, X. Chen, J. Zhang, and K. Huang, “Beyondtriplet loss: a deep quadruplet network for person re-identiﬁcation,” in

CVPR , 2017.[11] S. Karanam, Y. Li, and R. J. Radke, “Person re-identiﬁcation with block sparse recovery,”

Image and VisionComputing , 2017.[12] Y. Li, Z. Wu, S. Karanam, and R. J. Radke, “Multi-shothuman re-identiﬁcation using adaptive ﬁsher discriminantanalysis,” in

BMVC , 2015.[13] G. Lisanti, I. Masi, A. D. Bagdanov, and A. D. Bimbo,“Person re-identiﬁcation by iterative re-weighted sparseranking,”

IEEE Transactions on Pattern Analysis and MachineIntelligence , vol. 37, 2015.[14] J. You, A. Wu, X. Li, and W.-S. Zheng, “Top-push video-based person re-identiﬁcation,” in

CVPR , 2016.[15] Z. Zhou, Y. Huang, W. Wang, L. Wang, and T. Tan, “Seethe forest for the trees: Joint spatial and temporal recurrentneural networks for video-based person re-identiﬁcation,”in

CVPR , 2017.[16] S. Karanam, M. Gou, Z. Wu, A. Rates-Borras, O. Camps,and R. J. Radke, “A Systematic Evaluation and Bench-mark for Person Re-Identiﬁcation: Features, Metrics, andDatasets.”

IEEE Transactions on Pattern Analysis and Ma-chine Intelligence , 2018.[17] L. Zheng, Y. Yang, and A. G. Hauptmann, “Person re-identiﬁcation: Past, present and future,”

ArXiv:1610.02984 ,2016.[18] D. Gray and H. Tao, “Viewpoint invariant pedestrianrecognition with an ensemble of localized features.” in

ECCV , 2008.[19] T. Wang, S. Gong, X. Zhu, and S. Wang., “Person re-identiﬁcation by video ranking.” in

ECCV , 2014.[20] L. Zheng, Z. Bie, Y. Sun, J. Wang, C. Su, S. Wang, andQ. Tian, “Mars: A video benchmark for large-scale personre-identiﬁcation,” in

ECCV , 2016.[21] S. Karanam, E. Lam, and R. J. Radke, “Rank persistence:Assessing the temporal performance of real-world personre-identiﬁcation.” in

ICDSC , 2017.[22] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian,“Scalable person re-identiﬁcation: A benchmark,” in

ICCV ,2015.[23] D. S. Cheng, M. Cristani, M.Stoppa, L. Bazzani,and V. Murino., “Custom pictorial structures for re-identiﬁcation.” in

BMVC , 2011.[24] M. Hirzer, C. Beleznai, P. M. Roth, and H. Bischof., “Personre-identiﬁcation by descriptive and discriminative classiﬁ-cation.”

SCIA: Image Analysis , vol. 6688, 2011.[25] D. Baltieri, R. Vezzani, and R. Cucchiara, “3dpes: 3d peopledataset for surveillance and forensics,” in

Proceedings of the2011 Joint ACM Workshop on Human Gesture and BehaviorUnderstanding .[26] N. Martinel, C. Micheloni, and C. Piciarelli., “Distributedsignature fusion for person re-identiﬁcation.” in

ICDSC ,2012.[27] A. Das, A. Chakraborty, and A. K. Roy-Chowdhury, “Con-sistent re-identiﬁcation in a camera network.” in

ECCV ,2014.[28] O. Camps, M. Gou, T. Hebble, S. Karanam, O. Lehmann,Y. Li, R. J. Radke, Z. Wu, F. Xiong, “From the lab to the realworld: Re-identiﬁcation in an airport camera network.”

IEEE Transactions on Circuits and Systems for Video Technol-ogy , vol. Volume: 27, Issue: 3, 2017.[29] D. Figueira, M. Taiana, A. Nambiar, J. Nascimento, andA. Bernardino, “The HDA+ data set for research on fullyautomated re-identiﬁcation systems,” in

ECCV , 2014.[30] M. Gou, S. Karanam, W. Liu, O. Camps, and R. J. Radke,“DukeMTMC4ReID: A large-scale multi-camera person re- identiﬁcation dataset,” in CVPR Workshops , 2017.[31] P. Dollar, S. Belongie, and P. Perona, “The fastest pedes-trian detector in the west,” in

BMVC , 2010.[32] P. Dollar, R. Appel, S. Belongie, and P. Perona, “Fast featurepyramids for object detection,”

IEEE Transactions on PatternAnalysis and Machine Intelligence , vol. 36, 2014.[33] A. Bialkowski, S. Denman, S. Sridharan, C. Fookes, andP. Lucey, “A database for person re-identiﬁcation in multi-camera surveillance networks,” in

DICTA , 2012.[34] W. Li and X. Wang., “Locally aligned feature transformsacross views.” in

CVPR , 2013.[35] W. Li, R. Zhao, T. Xiao, and X. Wang., “DeepReID: Deepﬁlter pairing neural network for person re-identiﬁcation.”in

CVPR , 2014.[36] T. Matsukawa, T. Okabe, E. Suzuki, and Y. Sato, “Hierar-chical gaussian descriptor for person re-identiﬁcation,” in

CVPR , 2016.[37] F. Xiong, M. Gou, O. Camps, and M. Sznaier, “Personre-identiﬁcation using kernel-based metric learning meth-ods,” in

ECCV , 2014.[38] K. Simonyan and A. Zisserman, “Very deep convolutionalnetworks for large-scale image recognition,” in

ICLR , 2015.[39] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residuallearning for image recognition,” in

CVPR , 2016.[40] A. Krizhevsky, I.Sutskever, and G. E. Hinton, “Imagenetclassiﬁcation with deep convolutional neural networks,”in

NIPS , 2012.[41] Y. Li, Z. Wu, S. Karanam, and R. J. Radke, “Real-worldre-identiﬁcation in an airport camera network,” in

ICDSC ,2014.[42] N. Martinel, A. Das, C. Micheloni, and A. K. Roy-Chowdhury, “Temporal model adaptation for person re-identiﬁcation,” in

ECCV , 2016.[43] H. Wang, S. Gong, X. Zhu, and T. Xiang, “Human-in-the-loop person re-identiﬁcation.” in

ECCV , 2016.[44] L. Zheng, S. Wang, L. Tian, F. He, Z. Liu, and Q. Tian,“Query-adaptive late fusion for image search and personre-identiﬁcation,” in

CVPR , 2015.[45] J. Garcia, N. Martinel, C. Micheloni, and A. Gardel, “Per-son re-identiﬁcation ranking optimisation by discriminantcontext information analysis,” in

ICCV , 2015.

Meng Zheng

Meng Zheng is a Ph.D. studentin Electrical, Computer & Systems Engineeringfrom Rensselaer Polytechnic Institute. She re-ceived a M.S. and B.Eng. degree in School ofInformation and Electronics from Beijing Instituteof Technology, China. Her research interests in-clude computer vision and machine learning witha focus on person re-identiﬁcation.

Srikrishna Karanam

Srikrishna Karanam is aResearch Scientist in the Vision Technologiesand Solutions group at Siemens Corporate Tech-nology, Princeton, NJ. He has a Ph.D. degree inComputer & Systems Engineering from Rensse-laer Polytechnic Institute. His research interestsinclude computer vision and machine learningwith a focus on all aspects of image indexing,search, and retrieval for object recognition appli-cations.

Richard J. Radke

Richard J. Radke joined theElectrical, Computer, and Systems Engineeringdepartment at Rensselaer Polytechnic Institutein 2001, where he is now a Full Professor. Hehas B.A. and M.A. degrees in computationaland applied mathematics from Rice University,and M.A. and Ph.D. degrees in electrical engi-neering from Princeton University. His currentresearch interests involve computer vision prob-lems related to human-scale, occupant-awareenvironments, such as person tracking and re-identiﬁcation with cameras and range sensors. Dr. Radke is afﬁliatedwith the NSF Engineering Research Center for Lighting Enabled Sys-tems and Applications (LESA), the DHS Center of Excellence on Explo-sives Detection, Mitigation and Response (ALERT), and Rensselaer’sExperimental Media and Performing Arts Center (EMPAC). He receivedan NSF CAREER award in March 2003 and was a member of the2007 DARPA Computer Science Study Group. Dr. Radke is a SeniorMember of the IEEE and a Senior Area Editor of

IEEE Transactions onImage Processing . His textbook