Methods of the Vehicle Re-identification
MMethods of the Vehicle Re-identification
Mohamed Nafzi, Michael Brauckmann , and Tobias Glasmachers Facial & Video AnalyticsIDEMIA Identity & Security Germany AG [email protected], [email protected] , Institute for Neural ComputationRuhr-University Bochum, Germany [email protected]
Abstract.
Most of researchers use the vehicle re-identification based on classification. Thisalways requires an update with the new vehicle models in the market. In this paper, two typesof vehicle re-identification will be presented. First, the standard method, which needs an imagefrom the search vehicle. It produces a feature vector, which will be applied by the re-identificationof the search vehicle. VRIC and VehicleID data set are suitable for training this module. It willbe explained in detail how to improve the performance of this method using a trained network,which is designed for the classification. The second method takes as input a representative imageof the search vehicle with similar make/model, released year and colour. It is very useful when animage from the search vehicle is not available. It produces as output a shape and a colour features.This could be used by the matching across a database to re-identify vehicles, which look similarto the search vehicle. To get a robust module for the re-identification, a fine-grained classificationhas been trained, which its class consists of four elements: the make of a vehicle refers to thevehicles manufacturer, e.g. Mercedes-Benz, the model of a vehicle refers to type of model withinthat manufacturers portfolio, e.g. C Class, the year refers to the iteration of the model, whichmay receive progressive alterations and upgrades by its manufacturer and the perspective of thevehicle. Thus, all four elements describe the vehicle at increasing degree of specificity. The aim ofthe vehicle shape classification is to classify the combination of these four elements. The colourclassification has been separately trained. After the training, the classification layer will not beused. By both methods, even data of vehicles by some makes/models/released years/perspectivesor by some colours are not available, it will be possible to re-identify each vehicle. The results ofvehicle re-identification will be shown. Using a developed tool, the re-identification of vehicles onvideo images and on controlled data set using a search image will be demonstrated. The resultsof a proposed mix-mode, which is the combination of shape matching and colour classification,will be presented. This work was partially funded under the grant.
Keywords:
Vehicle Re-identification, Mix-Mode, CNN, Shape and Colour classification
The objective of the vehicle re-identification module is to recognize a vehicle within a large image orvideo data set. Two different methods will be trained and tested. – First, the standard vehicle re-identification. The known data set VRIC and VehicleID have beenused separately for training and testing. VRIC data set contains 2811 vehicle-IDs with 54808images and VehicleID contains 13164 vehicle-IDs with 113346 images for training. Also Multipleloss and a merged data set have been used to train on both data set. This, can increase therobustness of the module. Starting the training from a trained network, which has been trainedon shape classification using about eight million images, can significantly improve the results. Theresults of the fusion will be also presented. – In the training of the second method, which requires just a representative image looks similar tothe search vehicle in case its sample image is not available, a fine-grained vehicle classification has a r X i v : . [ c s . C V ] S e p Mohamed Nafzi et al. been used, which leads to feature representation with small intra-class variance. The modules havebeen trained using CNN-Networks. The combination of the shape and the colour feature vectorsleads to a robust re-identification of vehicles. • Training: Typically, a fine-grained class consists of four elements: the make of a vehicle refersto the vehicles manufacturer, e.g. Mercedes-Benz, the model of a vehicle refers to type ofmodel within that manufacturers portfolio, e.g. C Class, the year refers to the iteration of themodel, which may receive progressive alterations and upgrades by its manufacturer and theperspective of the vehicle. Thus, all four elements describe the vehicle at increasing degree ofspecificity. The aim of the vehicle shape classification is to classify the combination of thesefour elements. We trained our vehicle shape network on 11906 classes using about eight millionimages. We trained the colour classification separately on 10 classes using about two millionimages. • Application: In the application of our trained CNN-Network, the classification layer will notbe used. Our module supports searches using an image sample or a representative image of thesearch vehicle, which is sent to the template creation component. The search engine performsthe template matching across a video database using shape and colour features and returnsthe search results to the user. This method does not require the training of all vehicle classes.To get an alarm the make, the model, the released year, the perspective and the colour of theprobe and of the gallery images should be similar.
Some research has been performed on make/model classification to re-identify a search vehicle. Most ofit operated on a small number of make/models because it is difficult to get a labeled data set panningall existing make/models. Manual annotation is almost impossible because one needs an expert foreach make being able to recognize all its models and it is very tedious and time consuming process. [9]developed a make/model classification based on feature representation for rigid structure recognitionusing 77 different classes. Two distances have been tested, the dot product and the euclidean distance.[7] tested different methods by make/model classification of 86 different classes on images with sideview. The best one was HoG-RBF-SVM. [10] used 3D-boxes of the image with its rasterized low-resolution shape and information about the 3D vehicle orientation as CNN-input to classify 126different make/models. The module of [8] is based on 3D object representations using linear SVMclassifiers and trained on 196 classes. In a real video scene all existing make/models could occur.Considering that we have worldwide more than 2000 models, make/model classification trained juston few classes will not succeed in practical applications. [6] increase the number of the trained classes.His module is based on CNN and trained on 59 different vehicle makes as well as on 818 differentmodels. His solution seems to be closer for commercial use. Our developed module in our previouswork [1] was trained on 1447 different classes and could recognize 137 different vehicle makes as wellas 1447 different models of the released year between 2016 till 2018. Other research has been operatedon the known standard vehicle re-identification. Space-time contextual knowledge has been exploitedfor vehicle re-id subject to structured scenes. [3] incorporated spatio-temporal path information ofvehicles. This method improves the re-id performance on the VeRi-776 data set, it may not generalizeto complex scene structures when the number of visual spatio-temporal path proposals is very largewith only weak contextual knowledge available to facilitate model decision. [4] considered 20 vehiclekey points for learning and aligning local regions of a vehicle for re-identification. Clearly, this approachcomes with extra cost of exhaustively labeling these key points in a large number of vehicle images,and the implicit assumption of having sufficient image resolution/details for computing these keypoints. [5] worked on VehicleID data set, which includes multiple images of the same vehicle capturedby different real world cameras in a city. This data set is challenging in term to separate betweensimilar vehicles with few of differences but it is only constrained test scenarios due to the ratherartificial assumption of having high quality images of constant resolution. This makes them limited ehicle re-identification 3 for testing the true robustness of re-id matching algorithms in typically unconstrained wide-viewtraffic scene imaging conditions. [2] introduced the Veric data set to address the limitation of otherVehicle re-identification Benchmarks, which provides conditions giving rise to changes in resolution,motion blur, weather, illumination, and occlusion. In this paper, we show two methods of the vehiclere-identification, which could re-identify vehicles even if their classes are not included in the training.First method is the standard vehicle re-identification, which requires a probe image of the searchvehicle. This module has been trained using a merged data set of Veric and VehicleID. Its training hasbeen started from the trained make/model network used in the second method, which is trained onclassification using 11906 classes with about eight million images for the shape and using 10 classeswith about two million images for the colour. It uses shape and colour feature vectors for the re-id. Itworks even if a probe image of the search vehicle is not available. A representative image with similarmake, model, released year and colour of the search vehicle would be enough for the re-identification. Itcould be downloaded e.g. from the web. Experimental results show that the first method outperformsall state-of-the-art approaches on Veric and VehicleID data set. Here, the comparison has been donejust to the best published results. The second method helps to improve the performance of the firstmethod, and it gives a solution in case a probe image of the search vehicle is not available. Here,there are no defined data set we could use to compare the results to other research. Tests has beenevaluated on an internally data set.
Neural networks have been used in computer vision for a long time, but with the progress in hardwarecapabilities and growth of available training data over the last few years deep neural networks havebecome the most successful methods for many computer vision tasks. In some visual recognition tasks,even human-level accuracy can be surpassed. We used a CNN-networks based on ResNet architecture.Their coding time is 20ms (CPU 1 core, i7-4790, 3.6 GHz). In the figures 1 and 2, we show our wayto extract the feature vector, which will be used in the matching step by the vehicle re-identification.The figure 1 shows the trained CNN for the vehicle re-identification based on shape and colourclassification (method 2), and the figure 2 shows the trained CNN for the the standard vehicle re-identification (trained on gray images / method 1). Here, we started from the trained CNN from themethod 2, which has been trained on 11906 classes with about eight million images. This CNN-netis an expert to separate between vehicles with different makes, models or released years. by this waythe training is focusing to separate between different vehicles with similar makes, models and releasedyears but without forgetting to separate between vehicles with different makes, models or releasedyears. Here, two CNN-nets have been trained. By the first training all parameters are trainable. Bythe second CNN-net the convolution block is not trainable. Here, the training tunes just the IP-Layerfor the separation between the classes. The fusion shows the best results on Veric and VehicleID.
Mohamed Nafzi et al.
Fig. 1.
Feature vector extraction. The network CNN1 for the vehicle re-identification based on shape andcolour classification (method 2). Trained on 11906 classes for shape and on 10 classes for colour.
Fig. 2.
Feature vector extraction. The network CNN2 for the standard vehicle re-identification (method 1).Starting from a trained CNN (using trained CNN1 from the method 2).Blue indicates trainable parameters. Green shows not trainable parameters.Both CNNs are trained on a merged data set of Veric and VehicleID.CNN2 is the fusion of CNN-nets and shows the best results.ehicle re-identification 5
Our feature vectors (templates) are normalized to unit length. The matching as such is performed bycalculating the dot product between two feature vectors which i.e. the cosine of the angle between bothvectors. Hereby by the method 2, the matching scores of the color and of the shape feature vectors havedifferent distributions. Fusion uses a weighted sum of the match scores of both modalities. Optimalweights have been determined based on a predefined set of data. By method 1, the fusion score is thesum of the match scores, which have similar distributions.
According to our method 2 for vehicle re-identification based on shape and colour features, we needfor a vehicle search a respective search image of a certain make/model, released year and color. Themake and model of the search image does not need to be part of the make/models categories usedduring training.. In practice, we could have the case that we have an image just with the same shapebut not with the same color of the search vehicle, e.g. downloaded from a manufacturers internethomepage. In this case, we could apply our developed Mixed-Mode, which is the appropriate solutionfor this problem. In this mode, we combine the shape matching together with color classification.We use the shape feature vector for matching. As results, we get all vehicles that have the sameshape as the searched vehicle however potentially with different colors. After that, we apply the colorclassification to filter the results by the selected color. This mode is intended specifically to be usedin investigational scenarios.
In total, 406 best-shots and 85.130 detections were computed from Cam2, and 621 best-shots with199.963 detections from Cam4. Additionally, 33 controlled images were acquired from the web (Google)for subsequent experiments. Based on these VICTORIA data sets, we performed a number of testsusing the shape feature, the colour feature and the fusion of both. multiple probe images by shapematching have been also tested. Here, we have a set of images of the search vehicle with differentviews. By matching across a gallery image, we get a set of scores. Their maximum is the finale matchscore. This reduces the dependency of the perspective by matching. Tests have been evaluated onvideo data across still images. The figure 3 shows sample images from the video data set Cam2 andCam4. Results are shown in the figures 5 and 6. Here as shown, we got some high impostor scores bymatching of color templates, leading to a fall of the ROC curves. The reason for this is that the colorsilver is currently not included in the classes used for the training, thus we labelled it as grey. Dueto the sun-light conditions however, the silver color was mapped onto white. The figure 4 shows twosample images illustrating this effect.
Mohamed Nafzi et al.
Fig. 3.
The image on the left side shows a sample of a best-shot computed from the VICTORIA data set(Cam2). The image on the right side depicts a best-shot from Cam4 respectively.
Fig. 4.
The color silver is not included in our training of color classification. Right vehicle is labeled as graybut with sunlight looks close to white. It produces higher impostor scores with white vehicles like the vehicleon the left, this leads to a reduction of the verification rate as depicted by the black ROC curves in figures 5and 6
Fig. 5.
This figure shows ROC-curves of shape, color, fusion of color and shape and using multiple probeimages by shape. Computation was done matching of controlled single images from the internet against videodata set Cam2 from the project Victoria.
Color : matching using color template (black curve).
Shape : matching using shape template (blue solid curve).
Fusion Shape&Color : Fusion of shape and color matching scores (red solid curve).
Shape Multiple : matching using shape template and using multiple probe images (blue dashed curve).
Fusion Shape&Color Multiple : Fusion of shape using multiple probe images and color matching scores(dashed solid curve).
FAR : False Acceptance Rate. VR : Verification Rate.ehicle re-identification 7 Fig. 6.
This figure shows ROC-curves of shape, color, fusion of color and shape and using multiple probeimages by shape. Computation was done matching of controlled single images from the internet against videodata set Cam4 from the project Victoria.
Color : matching using color template (black curve).
Shape : matching using shape template (blue solid curve).
Fusion Shape&Color : Fusion of shape and color matching scores (red solid curve).
Shape Multiple : matching using shape template and using multiple probe images (blue dashed curve).
Fusion Shape&Color Multiple : Fusion of shape using multiple probe images and color matching scores(dashed solid curve).
FAR : False Acceptance Rate. VR : Verification Rate. . For evaluation, we utilised two most popular vehicle re-identification benchmarks. TheVehicleID data set [5] provides a training set with 113,346 images from 13,164 IDs and a test set with17,377 probe images and 2,400 gallery images from 2,400 identities. It adopts the single-shot re-idsetting, with only one true matching for each probe. The VRIC data set [2] has 54,808 images from2,811 IDs in training set. The probe and the gallery of the testing data set contain 2,811 images with2,811 vehicle IDs. The data split statistics are summarised in table 1. Evaluation . Table 2 compares our method1 (CNN2) explained in sections before with state-of-the-artmethods on two benchmarks. Our method outperforms all other competitors with large margins. Itsurpasses the best competitor in Rank-1 rate by 8.53% (this means 16.0% error reduction) and inRank-5 by 9.55% on VRIC, and in Rank-1 rate by 2.8% (this means 7.6% error reduction) and inRank-5 by 4.2% on VehicleID.
Table 1.
Data split of standard vehicle re-identification data sets evaluated in our experiments.Dataset Training IDs / Images Probe IDs / Images Gallery IDs / ImagesVehicleID [5] 13,164 / 113,346 2,400 / 17,377 2,400 / 2,400VRIC [2] 2,811 / 54,808 2,811 / 2,811 2,811 / 2,811 Mohamed Nafzi et al.
Table 2.
Comparative of standard vehicle re-identification results on two benchmarking data sets.Method VehicleID [5] VRIC [2]Rank-1 Rank-5 Rank-1 Rank-5OIFE(Single Branch)[4] 32.86 52.75 24.62 50.98Siamese-Visual[3] 36.83 57.97 30.55 57.30MSVF[2] 63.02 73.05 46.61 65.58our method 1 (CNN2) 65.82 77.25 55.14 75.13
Besides the statistical experiments from the section before, we performed manual tests on the secondmethod trained on shape and colour features with the vehicle re-identification tool. We tested alsothe mix-mode, which has been defined in this research. The figure 7 shows exemplary the search fora green Ford Ka. The left side of the figure depicts the selected search image, the middle part showsthe best-shots of the matches against the VICTORIA data (Cam3 video sequence), and the right sidepresents all detections belonging to the selected best-shot. The subsequent Figure 8 shows an examplefor the Mixed Mode. In this scenario, the user searches for a white Hummer 2. In case that a sampleimage of that Hummer 2 is available, however with a different color, here orange, he neverthelesscan apply the search that provides all occurrences of that Hummer 2 however with any color. In afollow-up step, color classification is applied to filter those result images with the searched color, herewhite.
Fig. 7.
This figure shows the vehicle re-identification based on shape and color features.ehicle re-identification 9
Fig. 8.
This figure shows the vehicle re-identification using the Mix-Mode based on shape feature and colorclassification. – Both vehicle re-identification methods work on classes even if they are not included in the training.They have not immediately to be updated with new released models. – The perspectives of the probe and of the gallery samples by mates should be similar to get analarm. Using multiple probe images with different views make the re-identification independentlyof the perspective. – Vehicle re-identification based on shape and colour classification works even if an image of thesearch vehicle is not available. A representative image is sufficient. It re-identifies all vehicles withsimilar makes, models, released years and colours. – An image of the search vehicle is required for the standard re-identification, which could re-identifyexactly the same vehicle. – The training of the Vehicle re-identification based on shape classification helps the training of thestandard re-identification because the size of the training data of the first training is much largerthan the the second training. Its results beats the best published methods as shown in the table2. – We are working on the classification of the perspective of the vehicle based on image or template. – We plan to augment training data for the standard vehicle re-identification. – We are working on different methods to improve the vehicle shape classification. – Victoria: funded by the European Commission (H2020), Grant Agreement number 740754 and isfor Video analysis for Investigation of Criminal and Terrorist Activities. – Florida: funded by the German Ministry of Education and Research (BMBF).