[PDF] Re-weighting and 1-Point RANSAC-Based PnP Solution to Handle Outliers

Abstract

The ability to handle outliers is essential for performing the perspective-n-point (PnP) approach in practical applications, but conventional RANSAC+P3P or P4P methods have high time complexities. We propose a fast PnP solution named R1PPnP to handle outliers by utilizing a soft re-weighting mechanism and the 1-point RANSAC scheme. We first present a PnP algorithm, which serves as the core of R1PPnP, for solving the PnP problem in outlier-free situations. The core algorithm is an optimal process minimizing an objective function conducted with a random control point. Then, to reduce the impact of outliers, we propose a reprojection error-based re-weighting method and integrate it into the core algorithm. Finally, we employ the 1-point RANSAC scheme to try different control points. Experiments with synthetic and real-world data demonstrate that R1PPnP is faster than RANSAC+P3P or P4P methods especially when the percentage of outliers is large, and is accurate. Besides, comparisons with outlier-free synthetic data show that R1PPnP is among the most accurate and fast PnP solutions, which usually serve as the final refinement step of RANSAC+P3P or P4P. Compared with REPPnP, which is the state-of-the-art PnP algorithm with an explicit outliers-handling mechanism, R1PPnP is slower but does not suffer from the percentage of outliers limitation as REPPnP.

Full PDF

11 Re-weighting and 1-Point RANSAC-BasedP n P Solution to Handle Outliers

Haoyin Zhou, Tao Zhang,

Senior Member, IEEE, and Jayender Jagadeesan,

Member, IEEE

Abstract —The ability to handle outliers is essential for performing the perspective- n -point (P n P) approach in practical applications, butconventional RANSAC+P3P or P4P methods have high time complexities. We propose a fast P n P solution named R1PP n P to handleoutliers by utilizing a soft re-weighting mechanism and the 1-point RANSAC scheme. We ﬁrst present a P n P algorithm, which servesas the core of R1PP n P, for solving the P n P problem in outlier-free situations. The core algorithm is an optimal process minimizingan objective function conducted with a random control point. Then, to reduce the impact of outliers, we propose a reprojection error-based re-weighting method and integrate it into the core algorithm. Finally, we employ the 1-point RANSAC scheme to try differentcontrol points. Experiments with synthetic and real-world data demonstrate that R1PP n P is faster than RANSAC+P3P or P4P methodsespecially when the percentage of outliers is large, and is accurate. Besides, comparisons with outlier-free synthetic data show thatR1PP n P is among the most accurate and fast P n P solutions, which usually serve as the ﬁnal reﬁnement step of RANSAC+P3P or P4P.Compared with REPP n P, which is the state-of-the-art P n P algorithm with an explicit outliers-handling mechanism, R1PP n P is slowerbut does not suffer from the percentage of outliers limitation as REPP n P. Index Terms —Perspective- n -Point; 1-Point RANSAC; soft re-weighting; robustness to outliers. (cid:70) NTRODUCTION T HE perspective- n -point (P n P) problem aims to deter-mine the position and orientation of a calibrated camerafrom n known correspondences between three-dimensional(3D) object points and their two-dimensional (2D) imageprojections. P n P is a core problem in the computer visionﬁeld and has found many applications, such as robot visionnavigation [1], augmented reality [2], and computer anima-tion. In the past decades, many effective P n P approacheshave been proposed with very fast computational speed [3][4] and high accuracy [5] [6].To date, most P n P algorithms are designed under theassumption that no outlier exists among the given 3D-2Dcorrespondences. However, in practical applications, thisoutlier-free assumption is often difﬁcult to satisfy. This isbecause image feature detection and matching approaches,such as SURF [7], BRISK [8] and ORB [9], do not alwaysgive perfect results due to scaling, illumination, shadowand occlusion. Outliers are often unavoidable and they havea signiﬁcant impact on the P n P methods. Even a smallpercentage of outliers will lead to a signiﬁcant decrease inaccuracy. Hence, the ability to handle outliers is essentialfor performing P n P algorithms in practical applications. Themost common outliers handling mechanism is to combine aP n P ( n = n P algorithm with the remaining inliers to • Haoyin Zhou and Jayender Jagadeesan are with the Surgical PlanningLaboratory, Brigham and Women’s Hospital, Harvard Medical School,Boston, MA, 02115, USA.E-mail: [email protected]; [email protected]. • Tao Zhang is with the Department of Automation, School of InformationScience and Technology, Tsinghua University, Beijing, China, 100086.E-mail: [email protected] reﬁne the result. A number of very fast closed-form P3P [14]or P4P [15] algorithms have been proposed. However, theirRANSAC combination scheme still needs many trials untilthe selected three or four 3D-2D correspondences are allinliers, which results in a high time complexity. Hence, thecomputational speed decreases signiﬁcantly as the percent-age of outliers increases. To reduce the time complexity, onenatural idea is to utilize the P n P algorithm with a smaller n .However, when n = n P problem has inﬁnitelymany solutions, which makes the conventional RANSAC-based scheme infeasible.To the best of our knowledge, except for RANSAC-basedmethods, the only P n P method that addresses outliers isREPP n P [3] proposed by Ferraz et al ., which is the state-of-the-art P n P method that is robust to outliers. REPP n P inte-grates an outlier rejection mechanism with camera pose esti-mation. It formulates the pose estimation problem as a low-rank homogeneous system in which the solution lies on itsone-dimensional (1D) null space. Outlier correspondencesare those rows of the linear system that perturb the nullspace and are progressively detected by projecting them onan iteratively estimated solution of the null space. AlthoughREPP n P is very fast and accurate, it suffers from a severelimitation that it cannot handle more than approximately of outliers.In this paper, we propose a robust 1-point RANSAC-based P n P method named R1PP n P. We ﬁrst present an opti-mal iterative process as the core P n P algorithm of R1PP n P.The core algorithm takes a random 3D-2D correspondenceas the control point. To address outliers, we propose a softweight assignment method according to reprojection errorsto distinguish inliers and outliers, and integrate it into thecore algorithm. The weight factors associated with outliersdecrease signiﬁcantly during the iteration to reduce theimpact of outliers. Finally, we employ the 1-point RANSAC a r X i v : . [ c s . C V ] J u l scheme to try different control points for the core P n P algo-rithm. By using this combination of the RANSAC schemeand the soft weight assignment, the algorithm is capable ofeliminating outliers when the selected control point is aninlier.The main advantage of R1PP n P is that it has muchlower time complexity and is much faster than conventionalRANSAC+P3P or P4P methods, especially when the per-centage of outliers is large. Compared with REPP n P, theproposed R1PP n P does not suffer from the percentage ofoutliers limitation.This paper is organized as follows. In Section II, we de-scribe the fundamental model used in R1PP n P. The detailsof the core algorithm are given in Section III, in which wealso provide its proof of convergence, local minima analysisand the strategy to select control points. The outliers han-dling mechanism, including soft weight assignment and the1-point RANSAC scheme, is introduced in Section IV. Wealso provide details of termination conditions in Section IV.Evaluation results are presented in Section V. A discussionand description of planned future work is described inSection VI.

The P n P problem, coined by Fischler and Bolles [13], isarticulated as follows:

Given the relative spatial locations of n control points, and given the angle to every pair of control points P i from an additional point called the center of perspective C , ﬁndthe lengths of the line segments joining C to each of the controlpoints. The P n P problem has been studied for many years.In early studies, direct linear transformation (DLT) [16] wasused as a solution in a straightforward way by solving alinear system. However, DLT ignores the intrinsic cameraparameters, which are assumed to be known, and thereforegenerally leads to a less stable pose estimate.In the past decade, researchers have proposed many P n Pmethods to improve speed, accuracy, and robustness to out-liers. The P n P methods can be roughly classiﬁed into non-iterative and iterative methods. Generally speaking, non-iterative methods are more efﬁcient but are unstable underimage noise and outliers. Many non-iterative P n P methodsare based on a set of small number of points ( n = 3 , ). Theyare referred to as P3P [17] [11] [14] or P4P [18] [15] [19] [20]methods. P3P is the smallest subset of control points thatyields a ﬁnite number of solutions [14] [21]. When the intrin-sic camera parameters are known and we have n ≥ points,the solution is generally unique. Triggs proposed a P n Pmethod with four- or ﬁve- correspondences [22]. These P n Pmethods based on less than four correspondences do notmake use of redundant points and are very sensitive to noiseand outliers. However, due to their efﬁciency and capabilityto calculate from a small point set, P3P or P4P methods arevery useful for combing a RANSAC-like scheme to rejectoutliers. There are also many non-iterative P n P methodsthat are able to make use of redundant points but are quitetime consuming. For example, Ansar’s method is O ( n ) [23]and Fiore’s is O ( n ) [24]. Schweighofer proposed an O ( n ) P n P method named SDP, but is slow [25]. In recent years,three excellent O ( n ) effective non-iterative P n P methods,EP n P [4], RP n P [26] and UP n P [27], have been proposed, and these methods are very efﬁcient and accurate evencompared to iterative methods.Iterative P n P methods [6] [28] [29] [30] are mostlyoptimization methods that decrease their energy functionin the iterative process. They are generally more accurateand robust, but slower. For example, Dementhon proposedPOSIT that is easy to implement [29] and further proposedSoftPOSIT to handle situations when the correspondencerelationships are unknown [31]. Although SoftPOSIT has acertain ability to handle outliers, the strong assumption thatall correspondences are unknown make it slow. Lu’s method[6] is the most accurate iterative P n P method but may getstuck in local minima. Schweighofer discussed the localminima situation of Lu’s method and proposed a methodto avoid this limitation [32].P n P algorithms are widely used in applications suchas structure from motion [33] and monocular SLAM [34],which require dealing with hundreds or even thousands ofnoisy feature points and outliers in real-time. The fact thatoutliers have a much greater impact on P n P accuracy thanimage Gaussian white noise makes it is necessary for theP n P algorithm to handle outliers efﬁciently. Conventionalmethod to handle outliers is to combine a RANSAC-likescheme with the P3P or P4P algorithms. Besides, L1-normis also widely used to handle a certain amount of outliers[35] [36] because the L1-norm penalty is less sensitive tooutliers than the L2-norm penalty. Although a L1-norm-based energy function is more robust to outliers, it cannotabsolutely get rid of outliers and its computation is morecomplex.Ferraz et al . proposed a very fast P n P method thatcan handle up to 50% of outliers [3]. The outlier rejectionmechanism is integrated within the pose estimation pipelinewith negligible computational overhead. Compared to Fer-raz’s method, the R1PP n P algorithm proposed in this paperdemonstrates much stronger robustness, but is slower.

UNDAMENTAL M ODEL

In this paper we denote the camera frame as c and the worldframe as w . For point i , without taking into account the dis-tortion, the perspective projection equations are employedto describe the pinhole camera model, u i = f x ci z ci , v i = f y ci z ci , (1)where f is the camera focal length, x i = [ u i , v i , f ] T isthe image homogeneous coordinate in pixels, and X ci =[ x ci , y ci , z ci ] T is the real-world coordinate with respect to thecamera frame.According to (1), X ci = λ ∗ i x i , (2)where λ ∗ i = z ci /f is the normalized depth of point i . (2)indicates that an object point lies on the straight line of sightof the related image point.The relationship between the camera and world framecoordinate of point i is X ci = RX wi + t , (3) where R ∈ SO (3) is the rotation matrix and t ∈ R is thetranslation vector. R and t are the variables that need to beestimated in the P n P problem.Similarly to the translation elimination method used inworks [37] [38], with two points i and o , X ci − X co = R ( X wi − X wo ) , i (cid:54) = o. (4)In the proposed R1PP n P algorithm, o ∈ [1 , N ] suggeststhe index of the control point, N is the number of 3D-2Dcorrespondences. R1PP n P represents the shape of the pointcloud by the relative positions between the control point o and other N − points. Denoting S i = X wi − X wo , where S means ”shape”, then, according to (2) and (4), λ ∗ i x i − λ ∗ o x o = RS i . (5)We divide both sides of (5) by the depth of the controlpoint λ ∗ o , and rewrite (5) as λ i x i − x o = µ RS i , (6)where λ i = µλ ∗ i and µ = 1 /λ ∗ o is the scale factor. We have t = 1 /µ x o − RX wo . (7)According to (6) and (7), the P n P problem can be solvedby minimizing the objective function f ( R , µ, λ i ) = N (cid:88) i =1 ,i (cid:54) = o (cid:107) λ i x i − x o − µ RS i (cid:107) , (8)where (cid:107)·(cid:107) is the L2-norm.The objective function (8) is based on Euclidean dis-tances in the 3D space. Compared with the reprojectionerror cost, Eq.(8) gives more weights to points with largerdepths. For example, the same level of reprojection error hasrelatively larger effects when related to an object point withgreater depth. To solve this problem, we normalize the costfunction (8) with depths of points and propose the objectivefunction of our R1PP n P algorithm, that is f ( R , µ, λ i ) = N (cid:88) i =1 ,i (cid:54) = o (cid:18) λ i (cid:107) λ i x i − x o − µ RS i (cid:107) (cid:19) , (9)where /λ i is introduced to adjust the weight of point i toeliminate the inequity among points in Eq.(8).We estimate R , µ and λ i ( i = 1 , ..., N, i (cid:54) = o ) byminimizing the objective function (9), the variables of whichconsist of two parts: the camera pose { R , µ } and the relativedepths with respect to the control point { λ i } . To describethe following algorithm intuitively, we introduce two setsof points: p i and q i . With a randomly selected control o ,points p i are determined by the camera pose { R , µ } , andpoints q i are determined by depths λ i . We have p i = x o + µ RS i , (10) q i = λ i x i . (11)As shown in Fig. 1, points p i are attached with thevirtual object obtained by rotating and scaling the real objectaround the control point p o = q o = x o . q i is the projection real objectvirtual objectin algorithm process Z c Y c control point o point i p i q i =λ i x i x i = [ u i , v i , f ] T p o = q o = x o = [ u o , v o , f ] T X c i =λ* i x i imaging planeoptical center v i Fig. 1. Demonstration of geometrical relationships with a bunny model.The mouth point is the control point o . In algorithm, all virtual pointsrotate and scale around the control point p o = q o = x o . We use thetail point to exemplify p i and its projection q i . Plane A is parallel tothe imaging plane and passes the camera optical center. Without lossof generality and for clearer demonstration, in this ﬁgure we use focallength f = 1 and all depths are distances between points and plane A. of p i on the corresponding line of sight. The objectivefunction (9) is equivalent to f ( p i , q i ) = N (cid:88) i =1 ,i (cid:54) = o (cid:18) λ i (cid:107) p i − q i (cid:107) (cid:19) . (12)As this objective function approaches the global optimalsolution and as shown in Fig. 1, point p i gets close to q i andthe z -component of p i gets close to f λ i . Hence, it is expectedthat the objective function (12) has similar optimal solutionsas the conventional reprojection error cost, because f ( p i , q i )= N (cid:80) i =1 ,i (cid:54) = o (cid:13)(cid:13)(cid:13)(cid:13)(cid:104) p i,x λ i , p i,y λ i , p i,z λ i (cid:105) T − x i (cid:13)(cid:13)(cid:13)(cid:13) ≈ N (cid:80) i =1 ,i (cid:54) = o (cid:13)(cid:13)(cid:13)(cid:13)(cid:104) f p i,x p i,z , f p i,y p i,z , f (cid:105) T − [ u i , v i , f ] T (cid:13)(cid:13)(cid:13)(cid:13) . (13) ORE A LGORITHM D ESIGN

We ﬁrst introduce the core algorithm of R1PP n P, whichsolves the P n P problem in outlier-free situations. This sec-tion introduces the core algorithm process, proof of con-vergence, the local minima avoidance mechanism and thestrategy to select the control point.The core algorithm of R1PP n P is an optimal iterative pro-cess with the objective function (9) or (12). In each iteration,it estimates the points set q i and p i ( i = 1 , ..., N, i (cid:54) = o ) alternately by ﬁxing one points set and updating the otherone according to the objective function minimization. (1) q i estimation stage. Because each q i are independent with each other, ouralgorithm seeks the closest q i for each p i . According to (11),points q i are constrained to the related lines of sight. Hence,we vertically project p i onto the related lines of sight toobtain the points’ relative depths with respect to the controlpoint o by λ i = x Ti p i / ( x Ti x i ) , i = 1 , ..., N and i (cid:54) = o. (14)Then, points q i are updated according to Eq.(11). (2) p i estimation stage. Points p i are determined by R and µ . According to (10),the updated R and µ should make points { µ RS i } havethe smallest weighted sum of squared distances to points { q i − x o } , and subject to R T R = I × . According to theobjective function (12), the weights used in this stage are /λ i in the previous iteration.Denoting matrices A = (cid:104) q − x o λ , ..., q N − x o λ N (cid:105) × N and S = (cid:104) S λ , ..., S N λ N (cid:105) × N , then according to Ref. [39] [ U , Σ , V T ] = svd( AS T ) , R = UV T , (15) q p v x o x o camera optical centerimage plane x p q v Fig. 2. Demonstration of the updating method of the scale factor µ . Onepossible method is to update µ according to the Euclidean distancesbetween p i , q i and o , which works for p and q because they haveclose depths as o . However, this method may result in slow µ updatingrate for p and q because (cid:107) q − o (cid:107) ≈ (cid:107) p − o (cid:107) . Hence, it is moreefﬁcient to compare v i and x i to move points p i to the related lines ofsight. Because points p i are directly generated from S i ac-cording to Eq.(10), Eq.(15) suggests that R is updatedaccording to the differences between points p i and q i inthe 3D space. However, Fig.2 demonstrates that by us-ing this method, the updating rate of µ may be slow insituations when the range of depths is large. To achievefaster convergence rate, we update the scale factor µ bycomparing the projected image coordinates of p i , whichare denoted as v i , and the real image points x i . Denot-ing matrices B = [ v − x o , ..., v N − x o ] × N and C =[ x − x o , ..., x N − x o ] × N . µ is updated by ∆ µ = (cid:107) vector( C ) (cid:107) / (cid:107) vector( B ) (cid:107) (16) µ new = µ old ∆ µ (17)Finally, points p i are updated according to Eq.(10). We ﬁrst provide the mathematical proof of the convergenceof R1PP n P when not using /λ i as weights in the objectivefunction (12). k denotes the number of iterations, q ( k +1) i isobtained by vertically projecting p ( k ) i to the line of sight i ,and q ( k +1) i and q ( k ) i are on the line of sight i . Hence, thethree points, p ( k ) i , q ( k ) i , and q ( k +1) i comprise a right-angledtriangle. Therefore, for each index i, i (cid:54) = o , (cid:13)(cid:13)(cid:13) p ( k ) i − q ( k +1) i (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) p ( k ) i − q ( k ) i (cid:13)(cid:13)(cid:13) − (cid:13)(cid:13)(cid:13) q ( k +1) i − q ( k ) i (cid:13)(cid:13)(cid:13) . (18)In the p ( k +1) updating stage, the updated R and µ makethe objective function (12) smaller. Hence, N (cid:88) i =1 (cid:13)(cid:13)(cid:13) p ( k +1) i − q ( k +1) i (cid:13)(cid:13)(cid:13) ≤ N (cid:88) i =1 (cid:13)(cid:13)(cid:13) p ( k ) i − q ( k +1) i (cid:13)(cid:13)(cid:13) . (19)According to (18), (19), and the objective function (12), f ( p ( k +1) i , q ( k +1) i ) ≤ f ( p ( k ) i , q ( k ) i ) − N (cid:80) i =1 (cid:13)(cid:13)(cid:13) q ( k +1) i − q ( k ) i (cid:13)(cid:13)(cid:13) . (20)Hence, the objective function will strictly decrease until q ( k +1) i = q ( k ) i when not using /λ i as weights. However,when /λ i is applied in the objective funtion, the aboveconvergence proof is not rigorous in mathematics because λ ( k +1) i (cid:54) = λ ( k ) i . As the iteration process, the changes of λ i become small, which makes the formula (20) hold. Inaddition, our experimental results in this paper also supportthe assumption that our algorithm is convergent. We have concluded that the iterative process of R1PP n P isconvergent. However, we still need to address situationsthat R1PP n P may get stuck in local minima. To demonstratethe iterative process more intuitively, we introduce a 1Dcamera working in the 2D space, as shown in Fig. 3. In thisdemonstration, an object with four points P i , i = 1 , ..., areprojected to the camera image plane and their image points x i are obtained. P is selected as the control point, whichmeans o = 1 . Different initial values may result in differentconvergence results.Fig. 3(a) demonstrates a process that is approaching thecorrect global optimal results. Beginning with points p ( k ) ,the algorithm projects p ( k ) to their related lines of sightand obtains points q ( k +1) . Then, according to q ( k +1) , thealgorithm updates the rotation R and scale factor µ togenerate points p ( k +1) . In this process, the rotation and scalefactor related to p ( k +1) are closer to the truth compared tothat related to p ( k ) , and ﬁnally the algorithm will reach thecorrect solution.However, as the progress shown in Fig. 3(b) indicates, p ( k +1) has a larger pose error than p ( k ) , and the algorithmwill ﬁnally get stuck in local minima. The reconstructedpoints p ( k ) or p ( k +1) come in mirror-image forms of the realobject points P . Without loss of generality, in either 2D or3D space, a mirror-image form suggests that the left-right-handed shape of the point cloud has been changed, whichshould not happen in reality. The reason the core algorithm p ( k+ p ( k ) P x x P P P x x image planeq ( k+ optical center q ( k+ p ( k+ p ( k ) x x x optical center image plane P P P P x Fig. 3. Iterative process with 2D space and 1D camera imaging plane. P , P , P and P are the object points; P is selected as the controlpoint ( o = 1 ). (a) The process p ( k ) → q ( k +1) → p ( k +1) → ... makes the estimation pose approach the correct solution; (b) the rotationrelated to p ( k +1) is worse than that related to p ( k ) , which means theprocess is approaching a local minima, which is a mirror-image form ofthe true object shape. of R1PP n P may generate points with different left-right-handed shape is that its rotation estimation equation (15)does not constrict det( R ) = 1 .In practice we found it not appropriate to constrain det( R ) = 1 from the beginning of the algorithm. Instead,we allow the iteration process to approach the mirror-imageform. This is because we found that, with the constrain det( R ) = 1 from the beginning, the algorithm has manytypes of local minima and they are unpredictable. However,without this constrain, the convergence direction of thecore algorithm becomes predictable, with only two types ofconvergence. The algorithm may reach the global optimalresult directly or the approximate mirror-image form. Forthe latter case, the estimated det( R ) = − .Hence, according to the above analysis, we propose thelocal minima avoidance mechanism. The algorithm beginswith a random initial value and control point. When thealgorithm converges to a result with det( R ) = − , weperform a mirror ﬂip by λ i, new = 1 /λ i, old , i = 1 , ..., N and i (cid:54) = o. (21) o Selection

To select different points as the control point o may resultin different convergence rates. Without taking into accountnoise, the correct value of rotation R should be the same forany control point o in a P n P task. Hence, larger rotationupdating steps in the iteration process suggest that lessnumber of iterations are required to converge to the correctvalue when starting from the same initial value. In R1PP n P, R is updated according to the differences between points p i and q i , i = 1 , ..., N, i (cid:54) = o . When point o is close to p i ,the rotation updating steps are more likely to be large, asshown in Fig.4(a). The updating rates of µ also follow thisanalysis. Therefore, we are prone to select the control point qpoptical centerimage planeo o qpox i x o x o v i qpoptical centerimage planeo o x i x o x o v i (a) (b) Fig. 4. In R1PP n P, the selection of control point o is related to the con-vergence rate. (a) An example to illustrate this behavior with 2D spaceand 1D camera imaging plane . According to the R and µ updatingmethods in Eqs. (15) and (16) . ∠ ( p − o , q − o ) > ∠ ( p − o , q − o ) and (cid:107) x i − x o (cid:107) / (cid:107) v i − x o (cid:107) > (cid:107) x i − x o (cid:107) / (cid:107) v i − x o (cid:107) suggest thatthe iteration process is more likely to have larger R and µ updating ratewhen o is closer to p . (b) A real-world example to illustrate this behavior,the radius of a circle represents the required number of iterations whenusing this feature point as the control point o . o from the center of the point cloud, which has better oddsof having smaller distances to the rest of the point cloud toachieve faster convergence rate, as shown in Fig.4(b). UTLIERS H ANDLING M ECHANISM

The robust and fast capability of handling outliers is themain contribution of the proposed R1PP n P algorithm. Ouroutliers handling mechanism combines a soft weight assign-ment method and the 1-point RANSAC scheme.

R1PP n P mainly consists of q i and p i estimation stages. Asdescribed in Section 3, in the q i stage, calculations related toeach point are independent from the others. Hence, outliersdo not affect inliers in the q i stage. However, in the p i stage, outliers perturbs the camera pose estimation results.To reduce the impact of outliers, the basic idea of our soft re-weighting method is to assign each 3D-2D correspondence aweight factor, and to make weight factors related to outlierssmall when estimating the camera pose in the p i stage.One possible method to assign weights is based on leastmedian of squares [40], however this method cannot handlemore than of outliers. We designed a soft weightassignment method embedded in the iteration process. Todistinguish inliers and outliers, the weights of 3D-2D corre-spondences are determined by w i = (cid:26) . H/e i if e i ≤ Hif e i > H , (22)where e i suggests the reprojection error of point i with thecurrent R and µ during iteration, H is the inliers thresholdthat points with ﬁnal reprojection errors smaller than H areconsidered as inliers. The reweighting rule (22) suggeststhat a point with a large reprojection error will have asmall weight during the estimation of camera pose, which isdesigned under a reasonable assumption that outliers havemuch larger reprojection errors than inliers. Although inliersmay also have larger reprojection errors than H duringthe iteration process, it is acceptable to assign weights that are smaller than 1 to inliers as long as outliers have muchsmaller weights. Hence, we simply use H as the benchmarkto assign weights.According to R estimation given by equations (15), wemultiply the weight factors with each item of matrices A and S , A i = w i A i , S i = w i S i . (23)Similarity, to update µ using (16), B i = w i B i , C i = w i C i . (24)Since inliers have much larger weights, R and µ aremainly estimated with inliers. The core algorithm of R1PP n P is based on a randomlyselected control point o . In outlier-free situations, our algo-rithm works with any control point. However, in situationswith outliers, the control point o should be an inlier tomake the algorithm work. Hence, we employ the 1-pointRANSAC scheme to try different 3D-2D correspondencesas the control point until the algorithm ﬁnds the correctsolution. The 1-point RANSAC scheme combines the corealgorithm naturally because the core algorithm can performthe computation with any control point o ∈ [1 , N ] . Weassume that 2D-3D correspondences have the same thepossibility to be an inlier, without loss of generality, weselect the control point o from the center of all image pointsto the outside. This is because we found that R1PP n P needsless iterations to converge when o is closer to the center, thedetails of which has been discussed in Section 3.3. In general, the overall ﬂow chart of R1PP n P is shown inFig. 5, we ﬁrst detect as many inliers as possible inside theRANSAC framework, then based on the detected inliers,we perform the R1PP n P algorithm without re-weightingmechanism to get more accurate results.

RANSAC try different reference o until [ RANSAC termination condition

Eq. (25) ] end do until [

Iteration termination condition A

Eq. (26)]end R1PP n P iterations with re-weighting mechanism; k = k + 1;do until [ Iteration termination condition B

Eq. (27)]end R1PP n P refinement iterations without re-weighting mechanism; k = k + 1; trials = trials + 1; trials = 0; k = 0; k = 0; to detect as many inliers as possible Refinement: to make pose estimation more accurate based on inliers already detected

Fig. 5. The overall ﬂow chart of R1PP n P. Appropriate termination conditions seek balance be-tween speed and precision for RANSAC-based or iterativealgorithms. As shown in Fig. 5, two types of terminationconditions need to be speciﬁed for R1PP n P. (1)

RANSAC Termination Condition

The standard RANSAC termination condition [13] wasemployed for R1PP n P, that is trials ≥ log(1 − p ) / log(1 − p s inliers ) , (25)where p is the certainty and we use p = 0 . for allRANSAC-based methods in this paper, trials is the numberof RANSAC trials, p inliers = (maximum number of detectedinliers) / (number of all points), s is the number of controlpoints needed in each RANSAC trial. s = 1 for R1PP n P, and s = 3 , for RANSAC+P3P and P4P respectively.During the RANSAC process, the camera pose estimatedby conventional RANSAC+P3P or P4P methods is based onvery small number of points. Because of image noise, theestimated pose varies with different inliers as the controlpoints. This is especially serious when the image noise islarge. To improve accuracy, the termination condition (25)suggests that the standard RANSAC scheme may continuelooking for better results after ﬁnding a large percentageof inliers. In contrast, R1PP n P takes into account all pointswhen estimating the pose, which makes it insensitive tothe selected control point o . Therefore it is a reasonableassumption that when p inliers is large enough (we used thethreshold of ), no improvement can be found and theRANSAC process of R1PP n P could be terminated. Accuracyevaluation results in this paper have testiﬁed the rationalityof this assumption.(2)

Termination Conditions for R1PPnP Iterations

As shown in Fig. 5, we ﬁrst detect as many inliers aspossible and the related termination condition A is satisﬁedwhen the detected number of inliers becomes stable, that is, N ( k )inlier − N ( k − ≤ and k > , (26)where k is the index of iterations, N inlier is the number ofdetected inliers. According to our experience, in most casesno more inliers would be detected if N inlier has not increasedin 20 iterations. Iterations || R k − R k − || Iterations D e t e c t ed N u m be r o f I n li e r s Fig. 6. Experiments with synthetic data (ordinary 3D case, of out-liers) to demonstrate the iteration process of R1PP n P when the controlpoint o is an outlier. Randomly colored lines are results with differentcontrol points. (a) The changes of estimated camera pose betweenframes k and k − are complex during the iteration process, basedon which it is difﬁcult to decide when to stop the process. (b) It is morerobust and efﬁcient to stop the process when no more inliers can bedetected. A good termination condition A should be able to stopthe iteration process as early as possible when point o isan outlier, and do not interrupt when point o is an inlier.We proposed the termination condition A with a windowsize = 20 iterations to seek balance between speed and robustness. This termination condition is not based on thecomparison of parameters of adjacent iterations because inR1PP n P, the dynamically updated weights w i may makethe convergence process complex, especially when point o is an outlier. As shown in Fig. 6(a), with an outlier as thecontrol point, (cid:13)(cid:13)(cid:13) R ( k ) − R ( k − (cid:13)(cid:13)(cid:13) may take many iterations toconverge to zero, which is slow. With the change of detectednumber of inliers in a larger window size, the terminationdecision can be more robust and efﬁcient, as shown in Fig.6(b).The reﬁnement stage makes pose estimation resultsmore accurate based on the detected inliers. Without thereweighting mechanism, the convergence process is muchsimple. Hence the termination condition B is satisﬁed whenthe estimated rotation becomes stable, that is, (cid:13)(cid:13)(cid:13) R ( k ) − R ( k − (cid:13)(cid:13)(cid:13) < e − . (27) XPERIMENTS

Mean Rotation Error % of outliers R o t a t i on E rr o r ( deg r ee s ) LHMEPnP+GNRPnPDLSOPnPASPnPSDPPPnPEPPnPREPPnPR1PPnP 0 5 10 15 200510152025303540

Mean Translation Error % of outliers T r an s l a t i on E rr o r ( % ) Fig. 7. Except for REPP n P and R1PP n P, most P n P methods cannothandle outliers.

The performance of the proposed R1PP n P algorithm wasevaluated by comparing against the state-of-the-art P n Pmethods. The source code was implemented in MATLABscripts and executed on a computer with an Intel Corei7 2.60 GHz CPU. We used both synthetic and real-worlddata to conduct evaluation experiments. The initial valuesfor R1PP n P are R = diag { , , } and µ = 1 e − .RANSAC+P3P or P4P methods also used the standardtermination condition (25). Synthetic experiments in this paper shared the followingparameters. The camera focal length is 1,000 pixels with aresolution of × . Two types of synthetic data weregenerated. (1) Ordinary three-dimensional (3D) case : objectpoints were randomly and uniformly distributed in a cuberegion [ − , × [ − , × [4 , . (2) Quasi-singular case : Thedistribution cube is [1 , × [1 , × [4 , . For each experimentresult, we report the mean values of 100 trials.For accuracy evaluation, the rotation error is measuredin degrees between the truth rotation R true and the es-timated R as e rot (deg) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:104) acos( r Tk, true · r k ) k =1 , , (cid:105) T (cid:13)(cid:13)(cid:13)(cid:13) × /π , where r k, true and r k are the k th column of R true and R respectively. The translation error is e trans (%) = (cid:107) t true − t (cid:107) / (cid:107) t (cid:107) × . Most P n P algorithms do not have the ability to handleoutliers, and even a small percentage of outliers will sig-niﬁcantly reduce the accuracy, as shown in Fig.7. Thus,although outlier-free situations are not the main concernof R1PP n P, we ﬁrst conducted comparison experimentsbetween the proposed R1PP n P and other P n P algorithmsin outlier-free situations. The reason for this comparison isthat RANSAC+P3P or P4P methods usually need other P n Pmethods as the ﬁnal reﬁnement step. Hence the accuracyand speed in outlier-free situations are also related to theperformance in situations with outliers.Here we only performed the core algorithm of R1PP n Pwithout outliers handling mechanism. The termination con-dition for R1PP n P iterations was as Eq.(27). In this ex-periment, we compared our proposed R1PP n P with thefollowing P n P methods: LHM [6], EP n P [4], RP n P [26], DLS[41], OP n P [5], ASP n P [42], SDP [25], PP n P [43], EPP n P [3]and REPP n P [3].In our accuracy evaluation experiments, the number ofpoints was 100 and we added different levels of Gaussianimage noises from 0 to 10 pixels. As shown in Figs. 8 and9, for both ordinary 3D and quasi-singular cases, R1PP n Pgave the most accurate rotation estimation results togetherwith OP n P and SDP. For ordinary 3D cases, R1PP n P wasamong the most accurate methods to estimate translationand was only sightly less accurate than OP n P. However, forquasi-singular cases, the accuracy of translation estimationof R1PP n P was not the state-of-the-art. ASP n P becameunstable with large image noise hence its mean accuracydecreased signiﬁcantly compared with that with small im-age noise. Although sometimes PP n P can provide accuraterotation estimation results in ordinary 3D cases, it alsosuffered from instability in some random cases, as shownby the jitter in Fig. 8. PP n P and LHM cannot handle thequasi-singular cases.To evaluate runtime, Gaussian image noise with a stan-dard deviation of σ = 5 pixels was added and the numberof points increased from 100 to 1000. As shown in Fig. 10,the proposed R1PPnP, together with EPP n P, REPP n P andASP n P showed superior computational speed. The runtimeof R1PP n P did not grow signiﬁcantly with respect to thenumber of points. We suspect that this results from theintrinsic parallel optimization of the matrix computation ofMATLAB 2014a.Generally speaking, in outlier-free situations, R1PP n Pwas among the state-of-the-art methods in terms of both ac-curacy and computational speed. One drawback of R1PP n Pis that the accuracy of translation estimation in quasi-singular cases was not among the best.

The main advantage of R1PP n P is that it is capable ofhandling a large percentage of outliers with a much fasterspeed than conventional methods. For demonstrating this,we introduced the following RANSAC-based P n P methodsfor comparison: (RANSAC+P3P [14]); (RANSAC + RP4P+ RP n P [26]); (RANSAC + P3P [14] + ASP n P [42]); and(RANSAC + P3P [14] + OP n P [5]). According to evaluationsin outlier-free situations, OP n P is the most accurate P n P Mean Rotation Error

Gaussian Image Noise (pixels) R o t a t i on E rr o r ( deg r ee s ) LHMEPnP+GNRPnPDLSOPnPASPnPSDPPPnPEPPnPREPPnPR1PPnP 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.51000.050.10.150.20.250.30.350.40.450.5

Mean Translation Error

Gaussian Image Noise (pixels) T r an s l a t i on E rr o r ( % ) Fig. 8. Accuracy with outlier-free synthetic data (ordinary 3D cases). Number of points was 100. Different levels of image noises were added.

Mean Rotation Error

Gaussian Image Noise (pixels) R o t a t i on E rr o r ( deg r ee s ) LHMEPnP+GNRPnPDLSOPnPASPnPSDPPPnPEPPnPREPPnPR1PPnP 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.51000.511.522.53

Mean Translation Error

Gaussian Image Noise (pixels) T r an s l a t i on E rr o r ( % ) Fig. 9. Accuracy with outlier-free synthetic data (quasi-singular cases). Number of points was 100. Different levels of image noises were added.PPnP is out of range. The accuracy of all P n P methods decreased signiﬁcantly compared with those in ordinary 3D cases, as shown in Fig.8.

100 200 300 400 500 600 700 800 900 100010 Number of Points M ean R un t i m e ( m s ) LHMEPnP+GNRPnPDLSOPnPASPnPSDPPPnPEPPnPREPPnPR1PPnP

100 200 300 400 500 600 700 800 900 100010 Number of Points M ean R un t i m e ( m s ) Fig. 10. Runtime results with outlier-free synthetic data. Standard deviation of image noise σ = 5 pixels. The number of points increased from 100to 1000. (Left) Ordinary 3D cases. (Right) Quasi-singular cases. Mean Rotation Error % of outliers R o t a t i on E rr o r ( deg r ee s ) RNSC P3PRNSC RP4P RPnPRNSC P3P OPnPRNSC P3P ASPnPREPPnPR1PPnP (a)

Mean Translation Error % of outliers T r an s l a t i on E rr o r ( % ) (b) Mean Rotation Error % of outliers R o t a t i on E rr o r ( deg r ee s ) (c) Mean Translation Error % of outliers T r an s l a t i on E rr o r ( % ) (d) Fig. 11. Average accuracy on synthetic data with outliers. (a)-(b) Accuracy with ordinary 3D cases; (c)-(d) Accuracy with quasi-singular cases. % of outliers M ean R un t i m e ( m s ) RNSC P3PRNSC RP4P RPnPRNSC P3P OPnPRNSC P3P ASPnPREPPnPR1PPnP % of outliers M ean N u m be r o f R A N SA C S a m p l e s RNSC P3PRNSC P4PR1PPnP

Fig. 12. Average runtime and number of required RANSAC samples with ordinary 3D cases synthetic data. We do not give the results withquasi-singular cases because they is very close to that with ordinary 3D cases. RANSAC+P3P or P4P needs more than 10 RANSAC trails when p outliers = 0 because the large image noise ( σ = 5 pixels) usually makes P3P or P4P methods unable to ﬁnd the correct pose with 3 or 4 inliers tosatisify the termination condition Eq. (25) . In contrast, the required number of RANSAC trials of R1PP n P is not sensitive to image noise because allpoints are taken into account. method and ASP n P and RP n P are fast. We selected thesemethods as the ﬁnal reﬁnement step to fully demonstrate the performance of RANSAC+reﬁnement-like methods. An-other important method is REPP n P [3], which is the state- P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P R o t a t i on E rr o r ( deg r ee ) (a) P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P T r an s l a t i on E rr o r ( % ) (b) P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P R un t i m e ( m s ) (c) −10−8−6−4−202 0%−10% 10%−20% 20%−30% 30%−40% 40%−50% 50%−60% 60%−70% 70%−80% P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P P P R P P + R P n PP P + O P n PP P + ASP n P R EPP n P R PP n P N u m be r o f I n li e r s D e t e c t i on E rr o r (d) Fig. 13. Statistical results with real-world data. The x -axis is ranges of the percentage of outliers. (a) Rotation error. (b) Translation error. (c) Runtime.(d) The number of detected inliers compared with the maximum, zero suggests this method ﬁnds the most inliers. of-the-art P n P algorithm that addresses outliers.The experiments were conducted as follows. N inlier =100 correct matches (inliers) between 3D object points and2D image points were generated. N outlier mismatches (out-liers) were generated by randomly corresponding 3D and2D points. The true percentage of outliers is p outlier = N outlier / ( N inlier + N outlier ) . Gaussian image noise with astandard deviation of σ = 5 pixels was added. For R1PP n Pand other RANSAC-based methods, the reprojection errorthreshold to distinguish inliers and outliers was H = 10 pixels.As shown in Fig. 11, REPP n P began to fail when thepercentage of outliers was larger than with ordinary 3Dcases, and only with quasi-singular cases. R1PP n P andRANSAC-based methods were capable of handling situa-tions with a large percentage of outliers. R1PP n P was moreaccurate than RANSAC-based methods for both rotationand translation estimation. Compared to other RANSAC-based methods, R1PP n P was much faster, especially whenthe percentage of outliers was large.

Our real-world experiments were conducted on the DTUrobot image data [44], which provides images and the

1. http://roboimagedata.imm.dtu.dk/. related 3D point cloud obtained by structured light scan.The true values of rotations and translations are known.Images have a resolution of × . Datasets numbered1 to 30 were used. In each dataset, images were capturedunder 19 different illumination situations and from 119camera positions. We selected 10 out of 19 illuminationsituations. Hence, a total of × ×

119 = 35700 imageswere included in this evaluation. Following the instruction,for each dataset and illumination situation, we used theimage numbered 25 as the reference image and performedSURF matching [7] between the reference image and otherimages. The inliers threshold was H = 5 pixels for allmethods. With each image, we ran all algorithms 5 timesand used the average value for the subsequent statistics.As shown in Fig.14, the total number of correspon-dences and the percentage of outliers varied with objects,illumination situations and camera poses. Although clearcomparisons require that only one factor is different, thiskind of variable-controlling is difﬁcult for P n P evaluationon real-world data because SURF matching results are un-predictable. In experiments we found that the performanceof P n P algorithms were mainly affected by the percentageof outliers, rather than the total number of correspondences.Therefore in this section, we report the evaluation resultsby comparing the statistical results of P n P methods at eachpercentage of outliers range. Because the true number of Fig. 14. Examples of images and R1P n P reprojection results. Green circles are all SURF correspondences and blue stars are the reprojected inliersdetected by R1PP n P. First row: images with different illumination situations and the 3D point cloud. Second-third row: different data sets. inliers was unknown, for each image, algorithms detectedinliers and we considered the maximum number of inliersas the ground truth.As shown in Fig. 13(a), as the percentage of outliersincreased, the runtime of R1PP n P did not grow signiﬁ-cantly compared with conventional RANSAC+P3P or P4Pmethods. When p outliers < , R1PP n P was slower thanpure RANSAC+P3P, but was much more accurate as shownin Fig. 13(a)(b). To improve accuracy, RANSAC+P3P needsother P n P methods, such as OP n P or ASP n P, as the ﬁ-nal reﬁnement step. Compared with other reﬁnement P n Pmethods, R1PP n P was slightly less accurate than OP n P,which was the most accurate P n P method according to bothsynthetic and real-world experiments, but much faster evenwhen the percentage of outliers was small.

295 4110 6344 7544 7987 4483 2385 1313 689 308 151 910100020003000400050006000700080009000 20 30 40 50 60 70 80 90 100 110 120 >120 N u m b e r o f I m a g e s Number of Iterations

Fig. 15. Histogram of the number of iterations of R1PP n P in real-worldexperiments.

Fig. 15 shows the histogram of the number of R1PP n Piterations on all 35700 images. As shown in Fig. 5, theiteration number includes iterations with the re-weightingmechanism that obtained the best results in RANSAC tri-als, and the subsequent reﬁnement iterations without re-weighting. The average number of required iterations is51.3.

ONCLUSIONS

We present a fast and robust P n P solution named R1PP n Pfor tackling the outliers issue. We integrate a soft re-weighting method into an iterative P n P process to distin-guish inliers and outliers, and employ the 1-point RANSACscheme for selecting the control point. The number of trialsis greatly reduced compared to conventional RANSAC+P3Por P4P methods; hence, it is much faster. Synthetic andreal world experiments demonstrated its feasibility. Exceptfor the good performance, another hidden advantage ofR1PP n P is that its code implementation is relatively easybecause all steps of R1PP n P involve only simple calcula-tions. For example, its minima avoidance mechanism onlyrequires to compute the determinant of the rotation matrixand to make λ new = 1 /λ old . The most appropriate situa-tions to replace conventional RANSAC+P3P methods withR1PP n P is when the percentage of outliers and/or the imagewhite noise is large. R1PP n P is more appropriate for largepoint clouds because of its low time complexity and therequirement to try control points. Future works involve thedevelopment of its extension for planar cases, and applyingit in the SLAM system to handle outliers when a new frameis encountered. A CKNOWLEDGMENT

This work is supported by NIH P41EB015898. R EFERENCES [1] M.-A. Raul, J. M. M. Montiel, and J. D. Tardos, “ORB-SLAM: Aversatile and accurate monocular slam system,”

IEEE Transactionson Robotics , vol. 31, no. 5, pp. 1147–1163, 2015.[2] M. M ¨uller, M.-C. Rassweiler, J. Klein et al. , “Mobile augmented re-ality for computer-assisted percutaneous nephrolithotomy,” vol. 8,no. 4, pp. 663–675, 2013.[3] L. Ferraz, X. Binefa, and F. Moreno-Noguer, “Very fast solution tothe PnP problem with algebraic outlier rejection,” in

CVPR , 2014,pp. 501–508.[4] V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate O(n)solution to the pnp problem,”

IJCV , vol. 81, no. 2, pp. 155–166,2009. [5] Y. Zheng, Y. Kuang et al. , “Revisiting the PnP problem: A fast,general and optimal solution,” in ICCV , 2013, pp. 2344–2351.[6] C.-P. Lu, G. D. Hager, and E. Mjolsness, “Fast and globally con-vergent pose estimation from video images,”

IEEE TPAMI , vol. 22,no. 6, pp. 610–622, 2000.[7] H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robustfeatures,”

ECCV , pp. 404–417, 2006.[8] S. Leutenegger, M. Chli, and R. Y. Siegwart, “BRISK: Binary robustinvariant scalable keypoints,” in

ICCV . IEEE, 2011, pp. 2548–2555.[9] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: Anefﬁcient alternative to SIFT or SURF,” in

ICCV . IEEE, 2011, pp.2564–2571.[10] D. Nist´er, “A minimal solution to the generalised 3-point poseproblem,” in

CVPR , vol. 1. IEEE, 2004, pp. 67–79.[11] X.-S. Gao, X.-R. Hou, J. Tang, and H.-F. Cheng, “Complete solu-tion classiﬁcation for the perspective-three-point problem,”

IEEETPAMI , vol. 25, no. 8, pp. 930–943, 2003.[12] K. Josephson and M. Byrod, “Pose estimation with radial distor-tion and unknown focal length,” in

CVPR . IEEE, 2009, pp. 2419–2426.[13] M. A. Fischler and R. C. Bolles, “Random sample consensus: aparadigm for model ﬁtting with applications to image analysisand automated cartography,”

Communications of the ACM , vol. 24,no. 6, pp. 381–395, 1981.[14] L. Kneip, D. Scaramuzza, and R. Siegwart, “A novel parametriza-tion of the perspective-three-point problem for a direct computa-tion of absolute camera position and orientation,” in

CVPR . IEEE,2011, pp. 2969–2976.[15] M. Bujnak, Z. Kukelova, and T. Pajdla, “A general solution to theP4P problem for camera with unknown focal length,” in

CVPR .IEEE, 2008, pp. 1–8.[16] R. Hartley and A. Zisserman,

Multiple view geometry in computervision . Cambridge university press, 2003.[17] D. DeMenthon and L. S. Davis, “Exact and approximate solutionsof the perspective-three-point problem,”

IEEE TPAMI , vol. 14,no. 11, pp. 1100–1105, 1992.[18] M. A. Abidi and T. Chandra, “A new efﬁcient and direct solutionfor pose estimation using quadrangular targets: Algorithm andevaluation,”

IEEE TPAMI , vol. 17, no. 5, pp. 534–538, 1995.[19] R. Horaud, B. Conio, O. Leboulleux, and B. Lacolle, “An analyticsolution for the perspective-4-point problem,”

Computer Vision,Graphics, and Image Processing , vol. 47, no. 1, pp. 33–44, 1989.[20] L. Quan and Z. Lan, “Linear n-point camera pose determination,”

IEEE TPAMI , vol. 21, no. 8, pp. 774–780, 1999.[21] W. J. Wolfe, D. Mathis, C. W. Sklair, and M. Magee, “The perspec-tive view of three points,”

IEEE TPAMI , vol. 13, no. 1, pp. 66–73,1991.[22] B. Triggs, “Camera pose and calibration from 4 or 5 known 3dpoints,” in

ICCV , vol. 1. IEEE, 1999, pp. 278–284.[23] A. Ansar and K. Daniilidis, “Linear pose estimation from pointsor lines,”

IEEE TPAMI , vol. 25, no. 5, pp. 578–589, 2003.[24] P. D. Fiore, “Efﬁcient linear solution of exterior orientation,”

IEEETPAMI , vol. 23, no. 2, pp. 140–148, 2001.[25] G. Schweighofer and A. Pinz, “Globally optimal o(n) solution tothe pnp problem for general camera models,” in

BMVC , 2008, pp.1–10.[26] S. Li, C. Xu, and M. Xie, “A robust o(n) solution to the Perspective-n-Point problem,”

IEEE TPAMI , vol. 34, no. 7, pp. 1444–1450, 2012.[27] L. Kneip, H. Li, and Y. Seo, “UPnP: An optimal o(n) solution tothe absolute pose problem with universal applicability,” in

ECCV .Springer, 2014, pp. 127–142.[28] D. G. Lowe, “Fitting parameterized three-dimensional models toimages,”

IEEE TPAMI , vol. 13, no. 5, pp. 441–450, 1991.[29] D. F. Dementhon and L. S. Davis, “Model-based object pose in 25lines of code,”

IJCV , vol. 15, no. 1, pp. 123–141, 1995.[30] R. Horaud, F. Dornaika, and B. Lamiroy, “Object pose: The linkbetween weak perspective, paraperspective, and full perspective,”

IJCV , vol. 22, no. 2, pp. 173–189, 1997.[31] P. David, D. Dementhon, R. Duraiswami, and H. Samet, “Soft-POSIT: Simultaneous pose and correspondence determination,”

International Journal of Computer Vision , vol. 59, no. 3, pp. 259–284,2004.[32] G. Schweighofer and A. Pinz, “Robust pose estimation from aplanar target,”

IEEE TPAMI , vol. 28, no. 12, pp. 2024–2030, 2006.[33] M. Havlena, A. Torii, and T. Pajdla, “Efﬁcient structure frommotion by graph optimization,”

ECCV , pp. 100–113, 2010. [34] R. Mur-Artal and J. D. Tard´os, “Fast relocalisation and loop closingin keyframe-based slam,” in

ICRA . IEEE, 2014, pp. 846–853.[35] F. Kahl, S. Agarwal, M. K. Chandraker, D. Kriegman, and S. Be-longie, “Practical global optimization for multiview geometry,”

IJCV , vol. 79, no. 3, pp. 271–284, 2008.[36] Q. Ke and T. Kanade, “Quasiconvex optimization for robust geo-metric reconstruction,”

IEEE TPAMI , vol. 29, no. 10, 2007.[37] L. Kneip, P. Furgale, and R. Siegwart, “Using multi-camera sys-tems in robotics: Efﬁcient solutions to the npnp problem,” in

ICRA .IEEE, 2013, pp. 3770–3776.[38] G. H. Lee, B. Li, M. Pollefeys, and F. Fraundorfer, “MinimalSolutions for the Multi-Camera Pose Estimation Problem,”

IJRR ,vol. 34, no. 7, pp. 837–848, 2015.[39] K. S. Arun, T. S. Huang, and S. D. Blostein, “Least-squares ﬁttingof two 3D point sets,”

IEEE TPAMI , no. 5, pp. 698–700, 1987.[40] D. Simpson, “Introduction to rousseeuw (1984) least median ofsquares regression,” pp. 433–461, 1997.[41] J. A. Hesch and S. I. Roumeliotis, “A direct least-squares (DLS)method for PnP,” in

ICCV . IEEE, 2011, pp. 383–390.[42] Y. Zheng, S. Sugimoto, and M. Okutomi, “ASPnP: An accurateand scalable solution to the perspective-n-point problem,”

IEICETransactions on Information and Systems , vol. 96, no. 7, pp. 1525–1535, 2013.[43] V. Garro, F. Crosilla, and A. Fusiello, “Solving the PnP problemwith anisotropic orthogonal procrustes analysis,” in .IEEE, 2012, pp. 262–269.[44] H. Aanæs, A. L. Dahl, and K. S. Pedersen, “Interesting interestpoints,”