### electronic imaging | 2019

# Face Recognition by the Construction of Matching Cliques of Points

### Abstract

This paper addresses the problem of face recognition using a graphical representation to identify structure that is common to pairs of images. Matching graphs are constructed where nodes correspond to local brightness gradient directions and edges are dependent on the relative orientation of the nodes. Similarity is determined from the size of maximal matching cliques in pattern pairs. The method uses a single reference face image to obtain recognition without a training stage. Results on samples from MegaFace obtain a 100% correct recognition result. Introduction The use of intuitively plausible features to recognise faces is a powerful approach that yields good results on certain datasets. Where it is possible to obtain a truly representative set of data for training and adjusting recognition parameters, optimal performance can be attained. However, when facial images are distorted by illumination, pose, occlusion, expression and other factors, some features become inappropriate and contribute noise to the discrimination on unseen data. Indeed it can never be known in advance what distortions will be present in unseen and unrestricted data and so features that are applied universally are likely to reduce performance at some point. Many approaches to face recognition are reported in the literature [1,2]. Graph matching approaches provide attractive alternatives to the feature space solutions in computer vision. Identifying correspondences between patterns can potentially cope with nonrigid distortions such as expression changes, pose angle and occlusions. However, graph matching is an NP-complete problem and much of current research is aimed at solving the associated computational difficulties. SIFT feature descriptors are used by Leordeanu et al [3] to construct spectral representations of adjacency matrices whose nodes are feature pair correspondences and entries are dependent on feature separations. Objects in low resolution images are recognised by matching correspondences against a set of pretrained models. Felzenszwalb et al [4] also match a graphical model of specific objects to images in which parts are matched according to an energy function dependent on colour difference and relative orientation, size and separation. Fergus et al [5] avoid the computational complexity of a fully connected shape model by adopting a “star” model that uses “landmark” parts. The model is trained using specific feature types and recognition is obtained by matching appearance densities of model parts. Kim et al [6] reduces the computational demands by first segmenting one of the images. Each region is mapped using SIFT descriptors and a function dependent on distortion, ordering, appearance and displacement is minimised to obtain appropriate candidate points and region correspondence. A more general approach by Duchenne et al [7] uses graph matching to encode the spatial information of sparse codes for pairs of images. An energy function is maximised using a graph cuts strategy that is dependent on node feature correlation, reduced node displacement and discouraging node crossing. Duchenne et al [8] also uses a tensor based algorithm to match hypergraphs in which correspondences are identified between groups of nodes and hyperedges linking them. The method is illustrated by matching two similar faces using triples of SIFT descriptors. Celiktutan et al [9] also match hypergraphs connecting node triples in the spatialtemporal domain by minimizing an energy function. Computation is reduced by considering a single salient point in each video frame and limiting connections along the time dimension. Kolmogorov et al [10] present a graph-cut algorithm for determining disparities that ensures that single pixels in one image are assigned single pixels in the second image and occlusions are handled correctly. An energy function is employed that is minimised by reducing the intensity difference between pixels, by penalizing pixel occlusions, and requiring neighbouring pixels to have similar disparities. Berg et al [11] sets up correspondences by identifying edge feature locations and measuring their similarity by using the correlation between feature descriptions and the distortion arising from local changes in length and relative orientation. An approximate version of Integer Quadratic Programming is used to detect faces. Cho et al [12] proposes a method for growing matching graphs where nodes represent features and edges the geometric relationships. The Symmetric Transfer Error is used to measure the similarity of node pairs and the reweighted random walk algorithm to match nodes. Shape driven graphical approaches [13-15] including active appearance models assign fiducial points to nodes and maximise a similarity function to obtain recognition of candidate images. Deep neural networks [16] obtained a 97% result on 17 facial images from the Yale Database A using facial features and the remaining 148 images for training the network. This paper makes use of a fully connected graph matching representation in order to measure the similarity of pairs of patterns. Nodes are pixels and edges take the value of the relative orientation of the two pixels. Node values are assigned the value of brightness gradient direction at that locality. Patterns represented by graphs being compared match if the pair obtains the highest number of matching node values and edge values. This simple framework has the advantage that the graph matching process is much faster because it can be reasonably assumed that if local node edges match, the relative orientation of many distant nodes will not vary significantly within that locality and will therefore also match. No training data is required and no other features are employed. The method requires single reference face images from each of the individuals to be recognised. Ideally the reference faces images should be of high quality with the background removed to improve performance. There are no such constraints on the candidate face images. Results using an earlier related approach [17] obtained a 100% correct result on the Yale Face Database A [18]. Figure 1. Ten of the 100 MegaFace candidates Figure 2. Corresponding reference faces Figure 3. Reference face matched against candidate Figure 4. Reference and candidate clique magnified Proposed Approach The approach taken in this paper detects structure that is common between pairs of images and uses the extent of such structure to measure similarity. In this case the size of the largest structure found to match both patterns is the number of nodes in the corresponding fully connected maximal graph or clique. A pictorial structure is represented as a collection of parts and by a graph ) , ( E V G \uf03d where the vertices } ,..., { 1 n v v V \uf03d correspond to the parts and there is an edge E v v j i \uf0ce ) , ( for each pair of connected parts i v and j v . An image part i v is specified by a location i x . In this paper parts i v correspond to individual pixels. Given a set of vertices } ,..., { 1 1 1 1 n v v V \uf03d in image 1 that correspond to a set of vertices } ,..., { 2 2 1 2 n v v V \uf03d in image 2 the following conditions are met by all parts to form a clique 1 ) ( ) ( \uf065 \uf0a3 \uf02d 2 i g 1 i g d d x x (1) j i j i x x d x x d j i a j i a \uf0b9 \uf022 \uf0a3 \uf02d , ) , ( ) , ( 2 2 2 1 1 \uf065 (2) where ) ( i g x d is the grey level gradient direction at i x and ) , ( j i a x x d is the angle subtended by the point pair ) , ( j i x x . Clique generation begins with the selection of a random pair of pixels \uf028 \uf029 1j 1 i x x , from reference image 1 and a pair \uf028 \uf029 2j 2 i x x , from candidate image 2 that satisfy (1,2). A new pair of points ) , ( 2 k 1 k x x is added where 1 ) ( ) ( \uf065 \uf0a3 \uf02d 2 k g 1 k g d d x x (3) 2 2 2 1 1 ) , ( ) , ( \uf065 \uf0a3 \uf02d m k a m k a x x d x x d (4) where 1 k x has not already been selected and 1 m x is the closest point to 1 k x from those already selected from reference image 1 1 k 1 p p m x x \uf02d \uf03d min arg New candidate points ) , ( 2 k 1 k x x are selected randomly and added to the clique if conditions (3,4) are satisfied. Up to N attempts are made to find a new point after which the current clique is completed and the construction of a new clique started. The search proceeds on a trial and error basis and the selection is not guided by additional heuristics as these have always been found to damage performance. After the generation of P cliques the largest is retained. Let the number of nodes in the maximal clique extracted between the reference image for class c and candidate image i be i c n . The classification of image i is given by i C where i c c i n C max arg \uf03d The relationship between points is not dependent upon their separation or absolute position and therefore the similarity measure is translation and scale invariant. It also means that there is no special constraint placed on the disparity of points that is dependent on their separation. The measure is partially invariant to the rotation of the images to within the angle 2 \uf065 . It should also be noted that although the cliques are maximal in terms of the algorithm, there is no guarantee that the cliques extracted are the largest theoretically possible; the solution of an NP-complete problem would be necessary to confirm this. MegaFace Database In order to make a further assessment of the performance of the latest algorithm, 100 reference faces and 100 candidate faces were taken from the MegaFace MF2 Training Dataset [19]. 10 of the 100 candidate faces are shown in Fig. 1 and the 10 corresponding reference faces are shown in Fig. 2. The reference faces have had the background set to white because commonality with the background adds noise to the result. In addition all the images have been linearly scaled down to be 100 pixels wide before analysis. The threshold on the brightness gradient direction is \uf0b0 \uf03d 55 1 \uf065 . The threshold on the angular difference between matching pairs of points in each image is \uf0b0 \uf03d 20 2 \uf065 . Up to N