[PDF] 2D Image Features Detector And Descriptor Selection Expert System

Abstract

Detection and description of keypoints from an image is a well-studied problem in Computer Vision. Some methods like SIFT, SURF or ORB are computationally really efficient. This paper proposes a solution for a particular case study on object recognition of industrial parts based on hierarchical classification. Reducing the number of instances leads to better performance, indeed, that is what the use of the hierarchical classification is looking for. We demonstrate that this method performs better than using just one method like ORB, SIFT or FREAK, despite being fairly slower.

Full PDF

22D I

MAGE F EATURES D ETECTOR A ND D ESCRIPTOR S ELECTION E XPERT S YSTEM

Ibon Merino

Industry and TransportTecnalia Research and InnovationDonostia-San SebastianSpain [email protected]

Jon Azpiazu

Industry and TransportTecnalia Research and InnovationDonostia-San SebastianSpain [email protected]

Anthony Remazeilles

Industry and TransportTecnalia Research and InnovationDonostia-San SebastianSpain [email protected]

Basilio Sierra

Computer Science and Artiﬁcial IntelligenceUniversity of the Basque Country UPV/EHUDonostia-San SebastianSpain [email protected]

June 5, 2020 A BSTRACT

Detection and description of keypoints from an image is a well-studied problem in Computer Vision.Some methods like SIFT, SURF or ORB are computationally really efﬁcient. This paper proposesa solution for a particular case study on object recognition of industrial parts based on hierarchicalclassiﬁcation. Reducing the number of instances leads to better performance, indeed, that is what theuse of the hierarchical classiﬁcation is looking for. We demonstrate that this method performs betterthan using just one method like ORB, SIFT or FREAK, despite being fairly slower. K eywords Computer vision, Descriptors, Feature-based object recognition, Expert system

Object recognition is an important branch of computer vision. Its main idea is to extract important data or features fromimages in order to recognize which object is present on it. Many different techniques are used in order to achieve this.In recent computer vision literature, it has been a widely spread tendency to use deep learning due to their beneﬁtsthrowing out many techniques of previous literature that, actually, have a good performance in many cases. Our aimis to recover those techniques in order to boost them and increase their performance or use their beneﬁts that neuralnetworks may not have.The classical methods in computer vision are based in pure mathematical operations were images are used as matrices.These methods look for gradient changes, patterns... and try to ﬁnd similarities in different images or build a machinelearning model to try to predict the objects that are present in the image.Our use case is the industrial area were many similar parts are to be recognized. Those parts vary a lot from one toanother (textures, size, color, reﬂections,...) so an expert is needed for choosing which method is better for recognizingthe objects. We propose a method that simulates the expert role. This is achieved learning a model that classiﬁes theobjects in groups that behave similarly to different recognition methods. This leads to a hierarchical classiﬁcation thatﬁrst classiﬁes the object to be recognized in one of the previously obtained groups and inside the group the method thatworks better in that group is used to recognize the object. a r X i v : . [ c s . C V ] J un he paper is organized as follows. In Section 2 we present a state of art of the most used 2D feature-based methods,including detectors, descriptors and matchers. The purpose of Section 3 is to present the method that we proposeand how we evaluate it. The experiments done and their results are shown in section 4. Section 5 summarizes theconclusions that can be drawn from our work. There are several methods for object recognition. In our case, we have focused on feature-based methods. Thesemethods look for points of interest of the images (detectors), try to describe them (descriptors) and match them(matchers). The combination of different detectors, descriptors and matchers vary the perfomance of the whole system.This is a fast growing area in image processing ﬁeld. The following short and chronologically ordered review presentsthe gradual improvements in feature detection (Subsection 2.1), description (Subsection 2.2) and matching (Subsection2.3).

One of the most used methods was proposed in 1999 by [Lowe(1999)]. This method is called SIFT, which stands forScale Invariant Feature Transform. The main idea is to use the Difference-of-Gaussian function (a close approximation tothe Laplacian-of-Gaussian proposed by Lowe) to search for extrema in the scale space. Even if SIFT was relatively fast,a new method, SURF (Speeded Up Robust Features) [Bay et al.(2006)Bay, Tuytelaars, and Van Gool], outperforms itin terms of repeatability, distinctiveness and robustness, although it can be computed and compared much faster.In addition, FAST (Features from Accelerated Segment Test) proposed by [Rosten and Drummond(2005)] introduce afast detector. FAST outperforms previous algorithms (like SURF and SIFT) in both computational performance andrepeatability. AGAST [Mair et al.(2010)Mair, Hager, Burschka, Suppa, and Hirzinger] is based on the FAST, but it ismore efﬁcient as well as generic. BRISK [Leutenegger et al.(2011)Leutenegger, Chli, and Siegwart] is a novel methodfor keypoint detection, description and matching which has a low computational cost (as stated in the correspondingarticle, an order of magnitude faster than SURF in some cases). Following the same line of FAST based mehods, weﬁnd ORB [Rublee et al.(2011)Rublee, Rabaud, Konolige, and Bradski], an efﬁcient alternative to SIFT or SURF. Thismethod’s detector is based on FAST but it adds orientation in order to obtain better results. In fact, this method performsat two orders of magnitude faster than SIFT, in many situations. [Lowe(1999)] also proposed a descriptor called SIFT. As mentioned above, is one of the most popular fea-ture detector and descriptor. The descriptor is a position-dependent histogram of local image gradient direc-tions around the interest point and is also scale invariant. It has numerous extensions such as PCA-SIFT[Ke and Sukthankar(2004)], that mixes PCA with SIFT; CSIFT [Abdel-Hakim and Farag(2006)], Color invariant SIFT;GLOH [Mikolajczyk and Schmid(2005)]; DAISY [Tola et al.(2010)Tola, Lepetit, and Fua], a dense descriptor inspiredin SIFT and GLOH; and so on. SURF descriptor [Bay et al.(2006)Bay, Tuytelaars, and Van Gool] relies on integralimages for image convolutions in order to obtain its speed.BRIEF [Calonder et al.(2010)Calonder, Lepetit, Strecha, and Fua] is a highly discriminative feature descriptor thatis fast both to build and to match. BRISK [Leutenegger et al.(2011)Leutenegger, Chli, and Siegwart] descriptor iscomposed as a binary string by concatenating the results of simple brightness comparison tests. ORB descriptor isBRIEF-based and adds rotation invariance and resistance to noise.LBP (Local Binary Patterns) [Ojala et al.(1996)Ojala, Pietikäinen, and Harwood] is a two-level version ofthe texture spectrum method [Wang and He(1990)]. This methods has been really popular and manyderivatives has been proposed. Based on this, the CS-LBP (Center-Symmetric Local Binary Pattern)[Heikkilä et al.(2009)Heikkilä, Pietikäinen, and Schmid] combines the strengths of SIFT and LBP. Later in 2010,the LTP (Local Ternary Pattern) [Liao(2010)] appeared, a generalization of the LBP that is more dis-criminant and less sensitive to noise in uniform regions. Same year, ELTP (Extended local ternary pat-tern) [Nanni et al.(2010)Nanni, Brahnam, and Lumini] improved this by attempting to strike a balance by us-ing a clustering method to group the patterns in a meaningful way. In 2012, LTrP (Local Tetra Patterns)[Murala et al.(2012)Murala, Maheshwari, and Balasubramanian] encoded the relationship between the referenced pixeland its neighbors, based on the directions that are calculated using the ﬁrst-order derivatives in vertical and horizontaldirections. In [Pietikäinen et al.(2011)Pietikäinen, Hadid, Zhao, and Ahonen] there are gathered other methods that arebased on the LBP. 2ther descriptor called FREAK [Alahi et al.(2012)Alahi, Ortiz, and Vandergheynst] is a keypoint descriptor inspiredby the human visual system and more precisely the retina. It is faster, usess less memory and more robust than SIFT,SURF and BRISK. They are thus competitive alternatives to existing descriptors in particular for embedded applications.

The most widely used method for matching is Nearest Neighbor (NN). Many algorithms follow this method. Oneof the most used is the kd-tree [Robinson(1981)] which works well with low dimensionality. For dealing withhigher dimensionalities many researchers have proposed diverse methods such as the Approximate Nearest Neighbor(ANN) by [Indyk and Motwani(1998)] or the Fast Approximate Nearest Neighbors of [Muja and Lowe(2009)] whichis implemented in the well known open source library FLANN (Fast Library for Approximate Nearest Neighbors).

As we have stated before, the issue we are dealing with is the recognition of industrial parts for pick-and-placing.The main problem is that the accurate recognition of some kind of parts are highly dependant on the recognitionpipeline used. This is because parts’ characteristics like texture (presence or absence), forms, colors, brightness; makesome detectors or descriptors work differently. We are thus proposing a systematic approach for selecting the bestrecognition pipeline for a given object (Subsection 3.2). We also propose in Subsection 3.3 an expert system thatidentiﬁes groups of parts that are recognized similarly to improve the overall accuracy. The recognition pipeline isexplained in Subsection 3.1.We start deﬁning some notations. An industrial part, or object, is named instance . The images captured of each partare named views . Given the set of views X , the set of instance labels Y and the set of recognition pipelines Ψ , thefunction ω Ψ X,Y ( y ) returns for each y ∈ Y the best pipeline ψ ∗ ∈ Ψ according to a metric F that is later discussed. Wecall ψ ∗∗ to the pipeline that on average performs better according to the evaluation metric, this is, that maximizes theaverage of the scores per instance (2). ω Ψ X,Y ( y ) = argmax ψ ∈ Ψ F ψy ( X, Y ) = ψ ∗ (1) ψ ∗∗ = argmax ψ ∈ Ψ (cid:88) y ∈ Y F ψy ( X, Y ) | Y | (2) A recognition pipeline Ψ is composed of 3 steps: detection, description and matching. Detectors, Γ , localize interestingkeypoints in the view (gradient changes, changes in illumination,...). Descriptors, Φ , are used to represent thosekeypoints in order to locate them in other views. Matchers, Ω , ﬁnd the closest features between views. So, a pipeline ψ is composed by a keypoint detector γ , a feature descriptor φ and a matcher ω . Figure 1 shows the structure of therecognition pipeline.The keypoints detection and description are described previously in the background section. In the matching, are twogroups of features: the ones that form the model (train) and the ones that need to be recognized (test). Different kind ofmethods could be used to match features, but, mainly, distance based techniques are used. This techniques make use ofdifferent distances (L2, hamming,...) to ﬁnd the closest feature to the one that needs to be labeled. Those two features(the test feature and the closest to this one) are considered a match. In order to discard ambiguous features, we use theLowe’s ratio test [Lowe(2004)] to deﬁne whether two features are a "good match". Assuming f t is the feature to berecognized, and f l and f l its two closest features from the model, then ( f t , f l ) is a good match if: d ( f t , f l ) d ( f t , f l ) < r (3)where d ( f A , f B ) is the distance (L2, Hamming,...) between features A and B, and r is a threshold that is used to validateif two features are similarly close to the test feature and discard it. This threshold is set at . . Now a simple votingsystem is used for labeling the view. For each view from the model (train) the number of good matches are counted.The good matches of each instances are summed and the test view is labeled as the instance with more good matches.3 LANNBrute force L2Brute force Hammingetc

HarrisSIFTORBetc

Images (views) Keypoints detector Keypoints Features descriptor Features

HarrisSIFTORBetcHarrisSIFTORBetcHarrisSIFTORBetc SIFTORBFREAKetcSIFTORBFREAKetcSIFTORBFREAKetcSIFTORBFREAKetc

TestTrainPart0Part1Part2Part¿? Matching ResultPart y

Figure 1: Recognition pipelineTable 1: Example of a confusion matrix for 3 instances.Actual instanceobject 1 object 2 object 3Predictedinstance object 1 40 10 0 50object 2 0 30 25 55object 3 10 10 25 4550 50 50 150

As we have said, we have the input views X , the instance labels Y and the pipelines Ψ . To evaluate the pipelines wehave to separate the views in train and test. The evaluation method used for it is Leave-One-Out Cross-Validation(LOOCV) [Kohavi(1995)]. It consists of | X | iterations, that for each iteration i , the train dataset is ( X − x i ) and thetest sample is x i . With this separation train-test we can generate the confusion matrix. Table 1 is an example of aconfusion matrix for 3 instances.As mentioned in the introduction of Section 3, we use the metric F value [Goutte and Gaussier(2005)] for scoringthe performance of the system. The score is calculated for the tests views from the LOOCV. F score, or value, iscalculated per each instance (4). This metric is an harmonic mean between the precision and the recall. The mean of allthe F ’s, ¯ F (5) is used for calculating ψ ∗∗ . F ( y ) = 2 · precision y ∗ recall y precision y + recall y (4) ¯ F = (cid:80) y ∈ Y F ( y ) | Y | (5)4he precision (Equation 6) is the ratio between the correctly predicted views with label y ( tp y ) and all predicted viewsfor that given instance ( | ψ ( X ) = y | ). The recall (Equation 7), instead, is the relation between correctly predicted viewswith label y ( tp y ) and all views that should have that label ( | label ( X ) = y | ). precision y = tp y | ψ ( X ) = y | (6) recall y = tp y | label ( X ) = y | (7) The function ω gives a lot of information about objects but it needs the instance to return the best pipeline for thatinstance which is not available a priori. Indeed, this is what we want to identify. We use the information that wouldprovide ω to build a hierarchical classiﬁcation based in a clustering of similar objects.Since some parts work better with some particular pipelines because of their shape, color or texture, we try to takeadvantage of this and make clusters of objects that are classiﬁed similarly well by each pipeline. For example, two partsthat have textures may be better recognized by pipelines that use descriptors like SIFT or SURF rather than non texturedparts. We call these clusters typologies. This clustering is made using the algorithm K-means [MacQueen(1967)], thataims to partition the objects into K clusters (where K < | Y | ) in which each object belongs to the cluster with thenearest centroids. The input is a matrix with the instances as rows and for each row the F value of each pipeline. Theinputs for this algorithm are for each instance an array of the F value obtained with every pipeline. The election of agood K may highly vary the result since if almost all the clusters are composed by 1 instance the result would be closeto just using ψ ∗∗ . After obtaining the K typologies, the ψ ∗ T ’s (8) are calculated, i.e., the best pipeline for each typology. ψ ∗ T = argmax ψ ∈ Ψ (cid:88) y ∈ T F ψy ( X, Y ) | T | (8)The ﬁrst step of the hierarchical recognition is to recognize the typology with the ψ ∗∗ . Given the typology t as thetypology predicted, the ψ ∗ t is used to recognize the instance y of the object. We call the hierarchical recognition Υ . TheFigure 2 shows an scheme of the hierarchical recognition for clariﬁcation. Our initial hypothesis is that Υ has a better performance than ψ ∗∗ . In order to demonstrate this hypothesis we conductedsome experiments. Moreover, we want to know in which way does the number of parts and the number of views perpart affect the result.The pipelines used (detector, descriptor and matcher) are deﬁned in Subsection 4.1. In Subsection 4.2, we explain thedataset we have created to evaluate the proposed method under the use case that is the industrial area and the resultsobtained. In order to compare these results with a well-known dataset in Subsection 4.3 we present the Caltech dataset[Fei-Fei et al.(2007)Fei-Fei, Fergus, and Perona] and the results obtained. The pipelines we have selected are shown in Table 2. Many combination could be done but it is not consistent to matchbinary descriptors with a L2 distance. The combinations chosen are compatible and may not be the best combination.LBP does not need a detector because it is a global descriptor.

We select 7 random industrial parts and on a white background we make 50 pictures per part from different anglesrandomly. That way, we have a dataset with 350 pictures. In Figure 3 are shown zoomed in examples of the picturestaken to the parts.We use subsets of the dataset to evaluate if changing the number of views per instance and the number of instance varythe performance. This subsets have from 3 to 7 parts and from 10 to 50 views (10 views step). In Table 3 are gathered5 est view ψ ** + k-NN Typology 2(t=2) ψ *t=2 Instance 6(y=6)Instance 4(y=4)Instance 5(y=5)Typology 3(t=3) ψ *t=3

Instance 8(y=8)Instance 7(y=7)Typology 1(t=1) ψ *t=1

Instance 1(y=1)Instance 2(y=2)Instance 3(y=3)input

Figure 2: Hierarchical classiﬁcationPipeline Detector Descriptor Matcher ψ SIFT SIFT FLANN ψ SURF SURF FLANN ψ ORB ORB Brute forceHamming ψ —- LBP FLANN ψ SURF BRIEF Brute forceHamming ψ BRISK BRISK Brute forceHamming ψ AGAST DAISY FLANN ψ AGAST FREAK Brute forceHammingTable 2: Pipelines composition.6

30 5 621

Figure 3: Parts used in our dataset. (cid:72)(cid:72)(cid:72)(cid:72)(cid:72) t

10 20 30 40 50 p (cid:72)(cid:72)(cid:72)(cid:72)(cid:72) ψ ∗∗ Υ ψ ∗∗ Υ ψ ∗∗ Υ ψ ∗∗ Υ ψ ∗∗ Υ Table 3: F ’s of the ψ ∗∗ ’s and Υ for each subset of our dataset. p stands for number of parts and t for number ofpictures per part.the results for all the subsets using ψ ∗∗ and Υ . The highest score for each subset is in bold. On average the hierarchicalrecognition performs better. The more parts or views per part, the better that performs the hierarchical recognitioncomparing with the best pipeline.Now we focus on the whole dataset. In Figure 4 are shown the F ’s of each instance using each pipeline for thisparticular case. The horizontal lines mark the ¯ F for that pipeline. The score we obtain with our method (last column)is higher (0.94) than the best pipeline which is ψ that corresponds to the pipeline that uses ORB (0.845).A truthful evaluation of the time performance of the hierarchical classiﬁcator is a bit cumbersome since it directlydepends on the clustering phase and on which are the best pipelines for each cluster. At least, it needs more timethan just using a single pipeline. Given t ( ψ ) the time need by the pipeline ψ , the time needed by Υ is approximately t ( ψ ∗∗ ) + t ( ψ ∗ T (cid:48) ) where T (cid:48) is the typology guessed by the ψ ∗∗ . In Table 4 is shown the time in seconds that each pipelineand the Υ need to recognize a view. ψ ψ ψ ψ ψ ψ ψ ψ Υ Ã Ã Ã Ã Ã Ã Ã Ã ¨ Figure 4: F score for each instance and algorithm.Figure 5: 6 random examples of images from the Caltech-101 dataset. The classes are: Face, Leopard, Motorbike,Airplane, Accordion and Anchor. Caltech-101 dataset [Fei-Fei et al.(2007)Fei-Fei, Fergus, and Perona] is a known dataset for object recognition thancould be similar to our dataset. This dataset has been tested like our dataset making subsets of the same characteristics.Some randomly picked images from the dataset are shown in Figure 5.The results obtained for the subsets of this datasets are shown in Table 5. Same conclusions are obtained for this dataset. (cid:72)(cid:72)(cid:72)(cid:72)(cid:72) t

10 20 30 40 50 p (cid:72)(cid:72)(cid:72)(cid:72)(cid:72) ψ ∗∗ Υ ψ ∗∗ Υ ψ ∗∗ Υ ψ ∗∗ Υ ψ ∗∗ Υ Table 5: F ’s of the ψ ∗∗ ’s for each test (Caltech-101). p stands for number of parts and t for number of pictures per part.8 Conclusion

We proposed a hierarchical recognition method based in clustering similar behaviour by the recognition pipelines. It hasbeen demonstrated that on average works better than just recognizing with classical feature-based methods achievinghigh F Scores (in the biggest case, 0.94 for our dataset and 0.843 for the Caltech-101).As we stated, once we recognize a piece we need its pose to tell the robot where to pick it. This has been let for futurework. The use of local features enables the possibility to estimate objects pose using methods such as Hough votingschema, RANSAC or PnP. Additionally, including new feature-based methods may lead to better performance or atleast more repeatability and scalability of the hierarchical recognition.

Acknowledgment

This paper has been supported by the project SHERLOCK under the European Union’s Horizon 2020 Research &Innovation programme, grant agreement No. 820689.

References [Abdel-Hakim and Farag(2006)] A. E. Abdel-Hakim and A. A. Farag. Csift: A sift descriptor with color invariantcharacteristics. In , volume 2, pages 1978–1983. Ieee, 2006.[Alahi et al.(2012)Alahi, Ortiz, and Vandergheynst] A. Alahi, R. Ortiz, and P. Vandergheynst. FREAK: Fast RetinaKeypoint. In , pages 510–517. IEEE, June2012.[Bay et al.(2006)Bay, Tuytelaars, and Van Gool] H. Bay, T. Tuytelaars, and L. Van Gool. SURF: Speeded Up RobustFeatures. In

Computer Vision – ECCV 2006 , pages 404–417. 2006.[Calonder et al.(2010)Calonder, Lepetit, Strecha, and Fua] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. Brief:Binary robust independent elementary features. In

European conference on computer vision , pages 778–792,2010.[Fei-Fei et al.(2007)Fei-Fei, Fergus, and Perona] L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visualmodels from few training examples: An incremental bayesian approach tested on 101 object categories.

ComputerVision and Image Understanding , 106(1):59 – 70, 2007.[Goutte and Gaussier(2005)] C. Goutte and E. Gaussier. A probabilistic interpretation of precision, recall and f-score,with implication for evaluation. In

European Conference on Information Retrieval , pages 345–359, 2005.[Heikkilä et al.(2009)Heikkilä, Pietikäinen, and Schmid] M. Heikkilä, M. Pietikäinen, and C. Schmid. Description ofinterest regions with local binary patterns.

Pattern Recognition , 42(3):425–436, March 2009.[Indyk and Motwani(1998)] P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse ofdimensionality. In

Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing , pages 604–613,1998.[Ke and Sukthankar(2004)] Y. Ke and R. Sukthankar. PCA-SIFT: a more distinctive representation for local imagedescriptors. In

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and PatternRecognition, 2004. CVPR 2004. , volume 2, pages 506–513. IEEE, 2004.[Kohavi(1995)] R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In

Ijcai , volume 14, pages 1137–1145. Montreal, Canada, 1995.[Leutenegger et al.(2011)Leutenegger, Chli, and Siegwart] S. Leutenegger, M. Chli, and R. Y. Siegwart. BRISK:Binary Robust invariant scalable keypoints. In , pages2548–2555, November 2011.[Liao(2010)] W. Liao. Region Description Using Extended Local Ternary Patterns. In , pages 1003–1006, August 2010.[Lowe(1999)] D. G. Lowe. Object recognition from local scale-invariant features. In

Proceedings of the Seventh IEEEInternational Conference on Computer Vision , volume 2, pages 1150–1157 vol.2, September 1999.[Lowe(2004)] D. G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints.

International Journal ofComputer Vision , 60(2):91–110, November 2004. 9MacQueen(1967)] J. MacQueen. Some methods for classiﬁcation and analysis of multivariate observations. In

Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics ,pages 281–297. University of California Press, 1967.[Mair et al.(2010)Mair, Hager, Burschka, Suppa, and Hirzinger] E. Mair, G. D. Hager, D. Burschka, M. Suppa, andG. Hirzinger. Adaptive and generic corner detection based on the accelerated segment test. In

European conferenceon Computer vision , pages 183–196. Springer, 2010.[Mikolajczyk and Schmid(2005)] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors.

IEEETransactions on Pattern Analysis and Machine Intelligence , 27(10):1615–1630, October 2005.[Muja and Lowe(2009)] M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithmconﬁguration.

VISAPP , 2(331-340):2, 2009.[Murala et al.(2012)Murala, Maheshwari, and Balasubramanian] S. Murala, R. P. Maheshwari, and R. Balasubrama-nian. Local Tetra Patterns: A New Feature Descriptor for Content-Based Image Retrieval.

IEEE Transactions onImage Processing , 21(5):2874–2886, May 2012.[Nanni et al.(2010)Nanni, Brahnam, and Lumini] L. Nanni, S. Brahnam, and A. Lumini. A local approach based on alocal binary patterns variant texture descriptor for classifying pain states.

Expert Systems with Applications , 37(12):7888–7894, 2010.[Ojala et al.(1996)Ojala, Pietikäinen, and Harwood] T. Ojala, M. Pietikäinen, and D. Harwood. A comparative studyof texture measures with classiﬁcation based on featured distributions.

Pattern Recognition , 29(1):51–59, January1996.[Pietikäinen et al.(2011)Pietikäinen, Hadid, Zhao, and Ahonen] M. Pietikäinen, A. Hadid, G. Zhao, and T. Ahonen.Local Binary Patterns for Still Images. In

Computer Vision Using Local Binary Patterns , Computational Imagingand Vision, pages 13–47. Springer London, 2011.[Robinson(1981)] John T. Robinson. The k-d-b-tree: A search structure for large multidimensional dynamic indexes.In

Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data , SIGMOD ’81,pages 10–18, 1981.[Rosten and Drummond(2005)] E. Rosten and T. Drummond. Fusing points and lines for high performance tracking.In

Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1 , pages 1508–1515 Vol. 2. IEEE,2005.[Rublee et al.(2011)Rublee, Rabaud, Konolige, and Bradski] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski.ORB: An efﬁcient alternative to SIFT or SURF. In , pages2564–2571, November 2011.[Tola et al.(2010)Tola, Lepetit, and Fua] E. Tola, V. Lepetit, and P. Fua. DAISY: An Efﬁcient Dense DescriptorApplied to Wide-Baseline Stereo.

IEEE Transactions on Pattern Analysis and Machine Intelligence , 32(5):815–830, May 2010.[Wang and He(1990)] L. Wang and D.C. He. Texture classiﬁcation using texture spectrum.