[PDF] Methods of the Vehicle Re-identification

Abstract

Most of researchers use the vehicle re-identification based on classification. This always requires an update with the new vehicle models in the market. In this paper, two types of vehicle re-identification will be presented. First, the standard method, which needs an image from the search vehicle. VRIC and VehicleID data set are suitable for training this module. It will be explained in detail how to improve the performance of this method using a trained network, which is designed for the classification. The second method takes as input a representative image of the search vehicle with similar make/model, released year and colour. It is very useful when an image from the search vehicle is not available. It produces as output a shape and a colour features. This could be used by the matching across a database to re-identify vehicles, which look similar to the search vehicle. To get a robust module for the re-identification, a fine-grained classification has been trained, which its class consists of four elements: the make of a vehicle refers to the vehicle's manufacturer, e.g. Mercedes-Benz, the model of a vehicle refers to type of model within that manufacturer's portfolio, e.g. C Class, the year refers to the iteration of the model, which may receive progressive alterations and upgrades by its manufacturer and the perspective of the vehicle. Thus, all four elements describe the vehicle at increasing degree of specificity. The aim of the vehicle shape classification is to classify the combination of these four elements. The colour classification has been separately trained. The results of vehicle re-identification will be shown. Using a developed tool, the re-identification of vehicles on video images and on controlled data set will be demonstrated. This work was partially funded under the grant.

Full PDF

MMethods of the Vehicle Re-identiﬁcation

Mohamed Nafzi, Michael Brauckmann , and Tobias Glasmachers Facial & Video AnalyticsIDEMIA Identity & Security Germany AG [email protected], [email protected] , Institute for Neural ComputationRuhr-University Bochum, Germany [email protected]

Abstract.

Most of researchers use the vehicle re-identiﬁcation based on classiﬁcation. Thisalways requires an update with the new vehicle models in the market. In this paper, two typesof vehicle re-identiﬁcation will be presented. First, the standard method, which needs an imagefrom the search vehicle. It produces a feature vector, which will be applied by the re-identiﬁcationof the search vehicle. VRIC and VehicleID data set are suitable for training this module. It willbe explained in detail how to improve the performance of this method using a trained network,which is designed for the classiﬁcation. The second method takes as input a representative imageof the search vehicle with similar make/model, released year and colour. It is very useful when animage from the search vehicle is not available. It produces as output a shape and a colour features.This could be used by the matching across a database to re-identify vehicles, which look similarto the search vehicle. To get a robust module for the re-identiﬁcation, a ﬁne-grained classiﬁcationhas been trained, which its class consists of four elements: the make of a vehicle refers to thevehicles manufacturer, e.g. Mercedes-Benz, the model of a vehicle refers to type of model withinthat manufacturers portfolio, e.g. C Class, the year refers to the iteration of the model, whichmay receive progressive alterations and upgrades by its manufacturer and the perspective of thevehicle. Thus, all four elements describe the vehicle at increasing degree of speciﬁcity. The aim ofthe vehicle shape classiﬁcation is to classify the combination of these four elements. The colourclassiﬁcation has been separately trained. After the training, the classiﬁcation layer will not beused. By both methods, even data of vehicles by some makes/models/released years/perspectivesor by some colours are not available, it will be possible to re-identify each vehicle. The results ofvehicle re-identiﬁcation will be shown. Using a developed tool, the re-identiﬁcation of vehicles onvideo images and on controlled data set using a search image will be demonstrated. The resultsof a proposed mix-mode, which is the combination of shape matching and colour classiﬁcation,will be presented. This work was partially funded under the grant.

Keywords:

Vehicle Re-identiﬁcation, Mix-Mode, CNN, Shape and Colour classiﬁcation

The objective of the vehicle re-identiﬁcation module is to recognize a vehicle within a large image orvideo data set. Two diﬀerent methods will be trained and tested. – First, the standard vehicle re-identiﬁcation. The known data set VRIC and VehicleID have beenused separately for training and testing. VRIC data set contains 2811 vehicle-IDs with 54808images and VehicleID contains 13164 vehicle-IDs with 113346 images for training. Also Multipleloss and a merged data set have been used to train on both data set. This, can increase therobustness of the module. Starting the training from a trained network, which has been trainedon shape classiﬁcation using about eight million images, can signiﬁcantly improve the results. Theresults of the fusion will be also presented. – In the training of the second method, which requires just a representative image looks similar tothe search vehicle in case its sample image is not available, a ﬁne-grained vehicle classiﬁcation has a r X i v : . [ c s . C V ] S e p Mohamed Nafzi et al. been used, which leads to feature representation with small intra-class variance. The modules havebeen trained using CNN-Networks. The combination of the shape and the colour feature vectorsleads to a robust re-identiﬁcation of vehicles. • Training: Typically, a ﬁne-grained class consists of four elements: the make of a vehicle refersto the vehicles manufacturer, e.g. Mercedes-Benz, the model of a vehicle refers to type ofmodel within that manufacturers portfolio, e.g. C Class, the year refers to the iteration of themodel, which may receive progressive alterations and upgrades by its manufacturer and theperspective of the vehicle. Thus, all four elements describe the vehicle at increasing degree ofspeciﬁcity. The aim of the vehicle shape classiﬁcation is to classify the combination of thesefour elements. We trained our vehicle shape network on 11906 classes using about eight millionimages. We trained the colour classiﬁcation separately on 10 classes using about two millionimages. • Application: In the application of our trained CNN-Network, the classiﬁcation layer will notbe used. Our module supports searches using an image sample or a representative image of thesearch vehicle, which is sent to the template creation component. The search engine performsthe template matching across a video database using shape and colour features and returnsthe search results to the user. This method does not require the training of all vehicle classes.To get an alarm the make, the model, the released year, the perspective and the colour of theprobe and of the gallery images should be similar.

Some research has been performed on make/model classiﬁcation to re-identify a search vehicle. Most ofit operated on a small number of make/models because it is diﬃcult to get a labeled data set panningall existing make/models. Manual annotation is almost impossible because one needs an expert foreach make being able to recognize all its models and it is very tedious and time consuming process. [9]developed a make/model classiﬁcation based on feature representation for rigid structure recognitionusing 77 diﬀerent classes. Two distances have been tested, the dot product and the euclidean distance.[7] tested diﬀerent methods by make/model classiﬁcation of 86 diﬀerent classes on images with sideview. The best one was HoG-RBF-SVM. [10] used 3D-boxes of the image with its rasterized low-resolution shape and information about the 3D vehicle orientation as CNN-input to classify 126diﬀerent make/models. The module of [8] is based on 3D object representations using linear SVMclassiﬁers and trained on 196 classes. In a real video scene all existing make/models could occur.Considering that we have worldwide more than 2000 models, make/model classiﬁcation trained juston few classes will not succeed in practical applications. [6] increase the number of the trained classes.His module is based on CNN and trained on 59 diﬀerent vehicle makes as well as on 818 diﬀerentmodels. His solution seems to be closer for commercial use. Our developed module in our previouswork [1] was trained on 1447 diﬀerent classes and could recognize 137 diﬀerent vehicle makes as wellas 1447 diﬀerent models of the released year between 2016 till 2018. Other research has been operatedon the known standard vehicle re-identiﬁcation. Space-time contextual knowledge has been exploitedfor vehicle re-id subject to structured scenes. [3] incorporated spatio-temporal path information ofvehicles. This method improves the re-id performance on the VeRi-776 data set, it may not generalizeto complex scene structures when the number of visual spatio-temporal path proposals is very largewith only weak contextual knowledge available to facilitate model decision. [4] considered 20 vehiclekey points for learning and aligning local regions of a vehicle for re-identiﬁcation. Clearly, this approachcomes with extra cost of exhaustively labeling these key points in a large number of vehicle images,and the implicit assumption of having suﬃcient image resolution/details for computing these keypoints. [5] worked on VehicleID data set, which includes multiple images of the same vehicle capturedby diﬀerent real world cameras in a city. This data set is challenging in term to separate betweensimilar vehicles with few of diﬀerences but it is only constrained test scenarios due to the ratherartiﬁcial assumption of having high quality images of constant resolution. This makes them limited ehicle re-identiﬁcation 3 for testing the true robustness of re-id matching algorithms in typically unconstrained wide-viewtraﬃc scene imaging conditions. [2] introduced the Veric data set to address the limitation of otherVehicle re-identiﬁcation Benchmarks, which provides conditions giving rise to changes in resolution,motion blur, weather, illumination, and occlusion. In this paper, we show two methods of the vehiclere-identiﬁcation, which could re-identify vehicles even if their classes are not included in the training.First method is the standard vehicle re-identiﬁcation, which requires a probe image of the searchvehicle. This module has been trained using a merged data set of Veric and VehicleID. Its training hasbeen started from the trained make/model network used in the second method, which is trained onclassiﬁcation using 11906 classes with about eight million images for the shape and using 10 classeswith about two million images for the colour. It uses shape and colour feature vectors for the re-id. Itworks even if a probe image of the search vehicle is not available. A representative image with similarmake, model, released year and colour of the search vehicle would be enough for the re-identiﬁcation. Itcould be downloaded e.g. from the web. Experimental results show that the ﬁrst method outperformsall state-of-the-art approaches on Veric and VehicleID data set. Here, the comparison has been donejust to the best published results. The second method helps to improve the performance of the ﬁrstmethod, and it gives a solution in case a probe image of the search vehicle is not available. Here,there are no deﬁned data set we could use to compare the results to other research. Tests has beenevaluated on an internally data set.

Neural networks have been used in computer vision for a long time, but with the progress in hardwarecapabilities and growth of available training data over the last few years deep neural networks havebecome the most successful methods for many computer vision tasks. In some visual recognition tasks,even human-level accuracy can be surpassed. We used a CNN-networks based on ResNet architecture.Their coding time is 20ms (CPU 1 core, i7-4790, 3.6 GHz). In the ﬁgures 1 and 2, we show our wayto extract the feature vector, which will be used in the matching step by the vehicle re-identiﬁcation.The ﬁgure 1 shows the trained CNN for the vehicle re-identiﬁcation based on shape and colourclassiﬁcation (method 2), and the ﬁgure 2 shows the trained CNN for the the standard vehicle re-identiﬁcation (trained on gray images / method 1). Here, we started from the trained CNN from themethod 2, which has been trained on 11906 classes with about eight million images. This CNN-netis an expert to separate between vehicles with diﬀerent makes, models or released years. by this waythe training is focusing to separate between diﬀerent vehicles with similar makes, models and releasedyears but without forgetting to separate between vehicles with diﬀerent makes, models or releasedyears. Here, two CNN-nets have been trained. By the ﬁrst training all parameters are trainable. Bythe second CNN-net the convolution block is not trainable. Here, the training tunes just the IP-Layerfor the separation between the classes. The fusion shows the best results on Veric and VehicleID.

Mohamed Nafzi et al.

Fig. 1.

Feature vector extraction. The network CNN1 for the vehicle re-identiﬁcation based on shape andcolour classiﬁcation (method 2). Trained on 11906 classes for shape and on 10 classes for colour.

Fig. 2.

Feature vector extraction. The network CNN2 for the standard vehicle re-identiﬁcation (method 1).Starting from a trained CNN (using trained CNN1 from the method 2).Blue indicates trainable parameters. Green shows not trainable parameters.Both CNNs are trained on a merged data set of Veric and VehicleID.CNN2 is the fusion of CNN-nets and shows the best results.ehicle re-identiﬁcation 5

Our feature vectors (templates) are normalized to unit length. The matching as such is performed bycalculating the dot product between two feature vectors which i.e. the cosine of the angle between bothvectors. Hereby by the method 2, the matching scores of the color and of the shape feature vectors havediﬀerent distributions. Fusion uses a weighted sum of the match scores of both modalities. Optimalweights have been determined based on a predeﬁned set of data. By method 1, the fusion score is thesum of the match scores, which have similar distributions.

According to our method 2 for vehicle re-identiﬁcation based on shape and colour features, we needfor a vehicle search a respective search image of a certain make/model, released year and color. Themake and model of the search image does not need to be part of the make/models categories usedduring training.. In practice, we could have the case that we have an image just with the same shapebut not with the same color of the search vehicle, e.g. downloaded from a manufacturers internethomepage. In this case, we could apply our developed Mixed-Mode, which is the appropriate solutionfor this problem. In this mode, we combine the shape matching together with color classiﬁcation.We use the shape feature vector for matching. As results, we get all vehicles that have the sameshape as the searched vehicle however potentially with diﬀerent colors. After that, we apply the colorclassiﬁcation to ﬁlter the results by the selected color. This mode is intended speciﬁcally to be usedin investigational scenarios.

In total, 406 best-shots and 85.130 detections were computed from Cam2, and 621 best-shots with199.963 detections from Cam4. Additionally, 33 controlled images were acquired from the web (Google)for subsequent experiments. Based on these VICTORIA data sets, we performed a number of testsusing the shape feature, the colour feature and the fusion of both. multiple probe images by shapematching have been also tested. Here, we have a set of images of the search vehicle with diﬀerentviews. By matching across a gallery image, we get a set of scores. Their maximum is the ﬁnale matchscore. This reduces the dependency of the perspective by matching. Tests have been evaluated onvideo data across still images. The ﬁgure 3 shows sample images from the video data set Cam2 andCam4. Results are shown in the ﬁgures 5 and 6. Here as shown, we got some high impostor scores bymatching of color templates, leading to a fall of the ROC curves. The reason for this is that the colorsilver is currently not included in the classes used for the training, thus we labelled it as grey. Dueto the sun-light conditions however, the silver color was mapped onto white. The ﬁgure 4 shows twosample images illustrating this eﬀect.

Mohamed Nafzi et al.

Fig. 3.

The image on the left side shows a sample of a best-shot computed from the VICTORIA data set(Cam2). The image on the right side depicts a best-shot from Cam4 respectively.

Fig. 4.

The color silver is not included in our training of color classiﬁcation. Right vehicle is labeled as graybut with sunlight looks close to white. It produces higher impostor scores with white vehicles like the vehicleon the left, this leads to a reduction of the veriﬁcation rate as depicted by the black ROC curves in ﬁgures 5and 6

Fig. 5.

This ﬁgure shows ROC-curves of shape, color, fusion of color and shape and using multiple probeimages by shape. Computation was done matching of controlled single images from the internet against videodata set Cam2 from the project Victoria.

Color : matching using color template (black curve).

Shape : matching using shape template (blue solid curve).

Fusion Shape&Color : Fusion of shape and color matching scores (red solid curve).

Shape Multiple : matching using shape template and using multiple probe images (blue dashed curve).

Fusion Shape&Color Multiple : Fusion of shape using multiple probe images and color matching scores(dashed solid curve).

FAR : False Acceptance Rate. VR : Veriﬁcation Rate.ehicle re-identiﬁcation 7 Fig. 6.

Color : matching using color template (black curve).

Shape : matching using shape template (blue solid curve).

Fusion Shape&Color : Fusion of shape and color matching scores (red solid curve).

Shape Multiple : matching using shape template and using multiple probe images (blue dashed curve).

Fusion Shape&Color Multiple : Fusion of shape using multiple probe images and color matching scores(dashed solid curve).

FAR : False Acceptance Rate. VR : Veriﬁcation Rate. . For evaluation, we utilised two most popular vehicle re-identiﬁcation benchmarks. TheVehicleID data set [5] provides a training set with 113,346 images from 13,164 IDs and a test set with17,377 probe images and 2,400 gallery images from 2,400 identities. It adopts the single-shot re-idsetting, with only one true matching for each probe. The VRIC data set [2] has 54,808 images from2,811 IDs in training set. The probe and the gallery of the testing data set contain 2,811 images with2,811 vehicle IDs. The data split statistics are summarised in table 1. Evaluation . Table 2 compares our method1 (CNN2) explained in sections before with state-of-the-artmethods on two benchmarks. Our method outperforms all other competitors with large margins. Itsurpasses the best competitor in Rank-1 rate by 8.53% (this means 16.0% error reduction) and inRank-5 by 9.55% on VRIC, and in Rank-1 rate by 2.8% (this means 7.6% error reduction) and inRank-5 by 4.2% on VehicleID.

Table 1.

Data split of standard vehicle re-identiﬁcation data sets evaluated in our experiments.Dataset Training IDs / Images Probe IDs / Images Gallery IDs / ImagesVehicleID [5] 13,164 / 113,346 2,400 / 17,377 2,400 / 2,400VRIC [2] 2,811 / 54,808 2,811 / 2,811 2,811 / 2,811 Mohamed Nafzi et al.

Table 2.

Comparative of standard vehicle re-identiﬁcation results on two benchmarking data sets.Method VehicleID [5] VRIC [2]Rank-1 Rank-5 Rank-1 Rank-5OIFE(Single Branch)[4] 32.86 52.75 24.62 50.98Siamese-Visual[3] 36.83 57.97 30.55 57.30MSVF[2] 63.02 73.05 46.61 65.58our method 1 (CNN2) 65.82 77.25 55.14 75.13

Besides the statistical experiments from the section before, we performed manual tests on the secondmethod trained on shape and colour features with the vehicle re-identiﬁcation tool. We tested alsothe mix-mode, which has been deﬁned in this research. The ﬁgure 7 shows exemplary the search fora green Ford Ka. The left side of the ﬁgure depicts the selected search image, the middle part showsthe best-shots of the matches against the VICTORIA data (Cam3 video sequence), and the right sidepresents all detections belonging to the selected best-shot. The subsequent Figure 8 shows an examplefor the Mixed Mode. In this scenario, the user searches for a white Hummer 2. In case that a sampleimage of that Hummer 2 is available, however with a diﬀerent color, here orange, he neverthelesscan apply the search that provides all occurrences of that Hummer 2 however with any color. In afollow-up step, color classiﬁcation is applied to ﬁlter those result images with the searched color, herewhite.

Fig. 7.

This ﬁgure shows the vehicle re-identiﬁcation based on shape and color features.ehicle re-identiﬁcation 9

Fig. 8.

This ﬁgure shows the vehicle re-identiﬁcation using the Mix-Mode based on shape feature and colorclassiﬁcation. – Both vehicle re-identiﬁcation methods work on classes even if they are not included in the training.They have not immediately to be updated with new released models. – The perspectives of the probe and of the gallery samples by mates should be similar to get analarm. Using multiple probe images with diﬀerent views make the re-identiﬁcation independentlyof the perspective. – Vehicle re-identiﬁcation based on shape and colour classiﬁcation works even if an image of thesearch vehicle is not available. A representative image is suﬃcient. It re-identiﬁes all vehicles withsimilar makes, models, released years and colours. – An image of the search vehicle is required for the standard re-identiﬁcation, which could re-identifyexactly the same vehicle. – The training of the Vehicle re-identiﬁcation based on shape classiﬁcation helps the training of thestandard re-identiﬁcation because the size of the training data of the ﬁrst training is much largerthan the the second training. Its results beats the best published methods as shown in the table2. – We are working on the classiﬁcation of the perspective of the vehicle based on image or template. – We plan to augment training data for the standard vehicle re-identiﬁcation. – We are working on diﬀerent methods to improve the vehicle shape classiﬁcation. – Victoria: funded by the European Commission (H2020), Grant Agreement number 740754 and isfor Video analysis for Investigation of Criminal and Terrorist Activities. – Florida: funded by the German Ministry of Education and Research (BMBF).