A Factor Graph Approach to Multi-Camera Extrinsic Calibration on Legged Robots
AA Factor Graph Approach to Multi-CameraExtrinsic Calibration on Legged Robots
Andrzej Reinke ∗ , † [email protected] Marco Camurri ∗ , ‡ [email protected] Claudio Semini ∗ [email protected] Abstract —Legged robots are becoming popular not only inresearch, but also in industry, where they can demonstrate theirsuperiority over wheeled machines in a variety of applications.Either when acting as mobile manipulators or just as all-terrainground vehicles, these machines need to precisely track thedesired base and end-effector trajectories, perform SimultaneousLocalization and Mapping (SLAM), and move in challengingenvironments, all while keeping balance. A crucial aspect forthese tasks is that all onboard sensors must be properly calibratedand synchronized to provide consistent signals for all the softwaremodules they feed. In this paper, we focus on the problem ofcalibrating the relative pose between a set of cameras and thebase link of a quadruped robot. This pose is fundamental tosuccessfully perform sensor fusion, state estimation, mapping,and any other task requiring visual feedback. To solve this prob-lem, we propose an approach based on factor graphs that jointlyoptimizes the mutual position of the cameras and the robot baseusing kinematics and fiducial markers. We also quantitativelycompare its performance with other state-of-the-art methods onthe hydraulic quadruped robot HyQ. The proposed approach issimple, modular, and independent from external devices otherthan the fiducial marker.
I. I
NTRODUCTION
Robot calibration is a widely explored subject that involvesthe estimation of kinematics and camera intrinsic/extrinsicparameters, either by optical or mechanical methods.
Kinematics calibration adjusts robot parameters such as:link lengths, angle offsets and other body dimensions that arecrucial for precise end-effector placement, especially in indus-try [1]. These adjustments are required due to imperfectionsin manufacturing of mechanical parts, their assemblies, andaging processes.
Intrinsic calibration retrieves the parameters of the so-calledcamera pinhole model, which projects 3D points in spaceonto the image plane of the camera. These parameters modelthe skewness of the CMOS elements, lens placement and theoptical center offsets. Since the lenses have finite dimensionand non-null thickness, the model is typically extended withadditional parameters to compensate for tangential and radialdistortions as well.Finally, extrinsic calibration consists on finding the pose(i.e., translation and rotation) of the camera with respect to aspecific object captured within its field of view. In robotics, itis also important to estimate the camera pose expressed in the
This work was supported by Istituto Italiano di Tecnologia (IIT) ∗ Dynamic Legged Systems Lab, Istituto Italiano di Tecnologia, Italy. † Gda´nsk University of Technology, Poland. ‡ Oxford Robotics Institute, University of Oxford, UK. robot base frame. We will refer to this process as robot-cameraextrinsic calibration . This tells the robot where its “eyes” are,enabling it to track external objects and achieve true spatialcognition. An estimate of this robot-camera extrinsic directlyinfluences all tasks involving sensor fusion of proprioceptiveand exteroceptive signals, including state estimation, mapping,planning, and control.Unfortunately, estimating robot-camera extrinsic is not aneasy task. In prototyping applications, sensors are often re-placed with new models due to obsolescence, hence theirmounting locations are not always known by CAD design.On the other hand, mature products will still suffer from wearand tear or manufacturing defects. Therefore, it is desirableto have an easy, fast and accurate method to automaticallyperform this type of calibration whenever possible.In this paper, we propose a factor graph based approachto robot-camera calibration for multiple cameras mountedon a legged robot. Factor graphs have been successfullyapplied to several inference problems [3], such as: SLAM,3D reconstruction, and spatiotemporal crop monitoring. Thiswork shows how they are suitable to perform robot-cameraextrinsic calibration as well.The contributions of the paper can be summarized asfollows: • We present a novel approach to multiple robot-cameracalibration based on factor graph optimization; the factorsare designed as constraints originated by kinematics andvisual detections of fiducial markers mounted at therobot’s end-effector. The approach is modular and adapt-able to any articulated robot with an arbitrary number ofcameras and limbs; • We provide a quantitative comparison between ourmethod and other state-of-the-art methods available todate. To the best of our knowledge, this is the first timesuch comparison is presented to the scientific community; • We deliver the implementation of this method as anopen-source ROS package .The paper outline is the following: Section II presentsthe related works on extrinsic camera calibration for mobile This is similar to the so-called “hand-eye calibration”, where the relativepose between a camera mounted on an arm and a robotic hand has to bedetected [2]. In our case, there is no actuated linkage between the camera andthe base, even though leg kinematics is used to perform the calibration. See https://github.com/iit-DLSLab/dls factor graph calibration a r X i v : . [ c s . R O ] J a n obots; Section III formally describe the problem of robot-camera extrinsic calibration; Section IV describes our factorgraph approach for extrinsic and multi-camera calibration;Section V states the result of this work and compares ourmethods with the state-of-the-art; Section VI summarizes thework, and points to future directions.II. R ELATED WORK
Extrinsic calibration in robotics has been investigated sincethe introduction of visual sensors to industrial manipulators.In this section, we limit to the literature dedicated to mobilerobots, which can be broadly divided into visual-kinematicsmethods and visual-inertial methods.
A. Visual-Kinematics Methods
The most common extrinsic camera calibration techniquefor mobile robots use an external marker (i.e. a checkerboard)placed at the end-effector. The procedure demands to move therobot arm constantly, recognize marker key points, and esti-mate the relative transformations to the camera. The collectedtransformations allow to compare camera marker poses withthe poses estimated through the kinematic chain, accordingto encoder values. Then, the cost function to be minimizedis formulated as the reprojection error between the detectedmarker and its projection through the kinematic chain.Pradeep et al. [4] proposed an approach based on bundleadjustment to calibrate the kinematics, lasers and cameraplacements on the PR2 robot. The optimization includes alltransformations needed for the robot (sensors to sensors andsensors to end-effectors) and the measurement uncertaintiesare modeled as Gaussians. The major drawback of the proce-dure is the calibration time, which is 20 minutes onboard thePR2, plus 25 minutes for the offline nonlinear optimization.This largely limits the application during field operations.More recently, Bl¨ochlinger et al. [5] applied the same ideato the quadruped robot StarlETH. They used a couple ofcircular fiducial marker arrays attached to the front feet andminimized the reprojection error between the detected markers(with a monocular camera) and their projection using forwardkinematics. The objective of the cost function includes a setof 33 parameters, including: link lengths, encoder offsets,time offsets, and also the robot-camera relative pose. Sincethe focus was more on finding the kinematics parameters,no results are shown for the robot-camera calibration. Theperformance on link lengths is assessed by comparison withthe CAD specifications, yielding a margin of .To reduce the uncertainty due to kinematics, in our pre-vious work [6], we developed a method based on fiducialmarkers and an external capture system. The fiducial markeris not attached to the robot body, but it is placed externally,surrounded by a circular array of Motion Capture (MoCap)markers (small reflective spheres). The MoCap markers arealso placed on the robot at a known location. When the cameradetects the fiducial marker, the robot-camera affine transformis recovered from the chain formed by the camera-to-fiducial,the vicon-to-fiducial and the robot-to-vicon transforms. An obvious drawback of this method is the need of an expensiveequipment for MoCap.
B. Visual-Inertial Methods
In the visual-inertial calibration domain, the objective isthe relative transformation between an IMU and a camera.If the relative pose between IMU and base link is known,this method is equivalent to robot-camera calibration. Sinceno kinematics is involved, the primary application is Micro-Aerial-Vehicles (MAVs): in this context, the different frequen-cies and time delays between inertial and camera data need tobe taken into account [7].Lobo and Dias [8] described a two-step, IMU-camera cali-bration procedure. First, they calculate the relative orientationby detecting the gravity vector with the IMU and the lines of avertically placed marker with the camera. Then, the translationis calculated as the lever arm separating the two sensors,which are vertically aligned from the previous step and movedon a turntable. The main disadvantage of this method is itsdependency on controlled setups, which are unavailable forsensors rigidly mounted on a mobile robot. Moreover, ignoringthe mutual correlation between rotation and translation canlead to further errors.Mirzaei and Rumeliotis [9] proposed a filter-based algorithmto overcome some limitations of [8]. The method is also two-step. First, they create a prior for the IMU-camera transformby using observations of an external marker for the camerapose and from manual measurement for the IMU pose, re-spectively. Second, they refine the initial guess by performingvisual-inertial state estimation with an Extended Kalman Filter(EKF). The state includes: position and linear velocity ofthe IMU, the IMU biases, and the unknown transformationbetween the IMU frame and the camera frame. The authorsalso demonstrate that these states are observable.Similarly to [9], Kelly et al. [10] proposed an approachbased on the Unscented Kalman Filter (UKF), which has beentested both with and without prior knowledge of a calibrationmarker.Furgale et al. [11] proposed an estimator for IMU-cameracalibration that simultaneously determines the transformationand the temporal offset between the two sensors. In contrastto previous methods, the time-varying states are representedas the weighted sum of B-spline functions, so that the statecan be considered continuous. Their proposed implementation,
Kalibr , is publicly available and it has been included in ourcomparative study.Visual-Inertial extrinsic calibration is independent fromrobot kinematics. However, it requires homogeneous andsmooth excitation on all the IMU axes to perform well. Eventhough this is simple for small quadrotors (manually movedin the air), it becomes not trivial for large legged robots.Furthermore, the precise location of the IMU on the robotis assumed as known.III. P
ROBLEM S TATEMENT
Let us consider a floating base, articulated robot with a mainbody, one or more limbs, and one or more cameras rigidlyig. 1: HyQ naming conventions and sensors. Each leg hasthree DoFs: Hip Abduction-Adduction (HAA), Hip Flexion-Extension (HFE), and the Knee Flexion-Extension (KFE). Thelegs are named as follows: Left Front (LF), Right Front (RF),Left Hind (LH), Right Hind (RH). The robot is equipped witha Multisense SL stereo (and LiDAR, not used) sensor and anASUS Xtion RGB-D sensor.attached to the main body. Our goal is to retrieve the relativepose between the optical center C i of the cameras and thebase frame B on to the main body, by means of: a) kinematicconstraints and b) the detection of a fiducial marker located atthe end-effector of a limb, with frame M (see Fig. 2a).In this paper, we will consider the HyQ robot [12] as theexperimental platform of choice. HyQ is a
90 kg , long hy-draulic quadruped robot with 12 actuated Degrees-of-Freedom(DoFs). Each leg has 3 DoFs (Fig. 1): the Hip Abduction-Adduction (HAA), the Hip Flexion-Extension (HFE) and theKnee Flexion-Extension (KFE). The reference frames areshown in Fig. 2a: the base frame B is conventionally placed atthe torso center, with the xy -plane passing through the HAAmotor axes; the frames C and C are the optical frames ofthe Multisense SL’s left camera and the ASUS Xtion’s RGBcamera, respectively; the marker frame M is at the bottomright corner of the ChAruco fiducial marker [13] mounted onthe tip of HyQ’s LF foot (Fig. 2b).IV. M ULTI -C AMERA F ACTOR G RAPH C ALIBRATION
The factor graph for multi-camera calibration is shown inFig. 3. In its general formulation, an array of M cameras ismounted on the robot. Their pose from the base frame B areexpressed by the nodes X i , i ∈ N M .The gray nodes L i , i ∈ N N denote the N observedfiducial marker poses visible from the cameras over time (thecameras are not required to detect all the observations). Weintroduce two types of factors: the kinematics factors and the fiducial factors. The former are unary factors associated to thelandmarks and express the kinematic constraints between themarker frame M and the base frame B by means of forwardkinematics. The latter connect X i with L i and correspond tothe relative pose between the fiducial marker and the camera.For all measurements, a zero-mean Gaussian noise model N (0 , Σ) is adopted, where the covariance Σ ∈ Λ is a diagonalmatrix whose values are computed taking to account: a) the (a) Frames of Reference (b)
Calibration Marker
Fig. 2: (a) Reference frames for HyQ extrinsic calibration. B isthe robot base link; M is the fiducial marker reference frame; C , C represent the optical center of the Multisense SL leftcamera and the RGB-D ASUS, respectively. (b) The specialfoot with the ChAruco marker [13]. X X k X M L i L N L factor L factor L factor L ...... . . .. . .Fig. 3: The Factor Graph representing the calibration of M cameras over N image poses. The gray “landmark” nodes L i represent the poses of the fiducial marker acquired by thecameras. The unary factors from these nodes correspond to theforward kinematics constraint to the base, computed from theencoder values. The nodes X i represent the unknown poses ofthe camera in the base frame. They are joined to the landmarknodes by the factors computed from the visual detection ofthe fiducial markers.encoder uncertainty and their propagation to the end effector[14]; and b) the uncertainty on the vertex jitter of the markerdetection [13]. After accumulating N marker poses, a Gauss-Newton factor graph optimization is performed.V. E XPERIMENTAL V ALIDATION
We have implemented the graph from Section IV on theHyQ robot, using the GTSAM library [3]. Given the flexibilityof the approach, new cameras or markers can be added just byincluding the corresponding nodes in the graph. The kinemat-ics factors are given by the forward kinematics through theRobCoGen library [15]. The fiducial factors are implementedith the AruCo library [13], which is capable of sub-pixelaccuracy and provides pose estimation markers (ChAruco).In this section, we compare the performance of this ap-proach with the MoCap-based method of [6] and Kalibr [11].The ground truth was collected by physically constrainingthe robot to a known configuration and detecting a
600 cm marker with a known pose (accurately measured with digitalcalibers and inclinometers) from the robot.Table I shows the linear and angular Mean Absolute Error(MAE) obtained for the two cameras mounted on HyQ.For Kalibr, we have collected a dataset of 2778 frames(
526 s ), during which the robot was excited (either manually orwith some actuation) on all axes. In general, the mean resultsare accurate (linear and angular errors for the Multisense SLare . and . , respectively), even though the effortto excite the robot properly makes the procedure difficult torepeat (hence the high standard deviation). We believe thelow resolution of the ASUS is the major cause of the poorerresults. With the kinematics calibration, the marker can getmuch closer to the camera, reducing this effect.For the MoCap method, we collected 562 frames (
142 s )of the static robot facing the marker. For the Multisense, themethod shows a . error in both translation and rotation.Since the MoCap has sub-millimeter accuracy, the markerplacement (either on the robot and on the fiducial marker)is the major source of error.To test the effect of multiple camera constraints, we testedthe factor graph method both for individual cameras and withboth cameras at the same time. We collected a dataset of1312 images for a total duration of
348 s while the leg waspassively moved in front of both cameras. In general, bothmethod show slightly worse but more stable and balancedresults than Kalibr. In particular, the mutual constraint betweenthe cameras reduced the error for the Multisense setup from . to .
03 % on translation and from . to . inrotation (cf. F1 and F2 in Table I).VI. C ONCLUSIONS AND F UTURE W ORK
In this paper, we have presented a factor graph approachto multi-camera calibration of a quadruped robot. We havedemonstrated that a factor graph framework represents avalid and flexible alternative to visual-inertial methods, whichrequire smooth motion and balanced excitation of all axes toprovide reliable results. On the other hand, visual-kinematicsmethods require hardware modifications at the end-effector(i.e., a fiducial marker support) to be precise.Future developments from this work are oriented towardsthe integration of both methods in a factor graph fashion tomake the calibration even more robust and automated.R
EFERENCES[1] S. Y. Nof,
Handbook of Industrial Robotics , 2nd ed. New York, NY,USA: John Wiley & Sons, Inc., 1999.[2] R. Horaud and F. Dornaika, “Hand-Eye Calibration,”
The InternationalJournal of Robotics Research (IJRR) , vol. 14, no. 3, pp. 195–210, 1995.[3] F. Dellaert and M. Kaess, “Factor Graphs for Robot Perception,”
Foundations and Trends in Robotics , vol. 6, no. 1-2, pp. 1–139, Aug2017.
Multisense SLX ( σ ) Y ( σ ) Z ( σ ) Roll ( σ ) Pitch ( σ ) Yaw ( σ ) [ cm ] [ cm ] [ cm ] [deg] [deg] [deg] K M F1 F2 ASUS XtionK F1 F2 TABLE I: Experimental comparison between different cali-bration methods. We evaluate the Mean Absolute Error andits standard deviation on the Multisense SL (top) and ASUSXtion (bottom): Kalibr [11] (K), MoCap [6] (M), Factor Graphwith single camera (F1) and Factor Graph with two cameras(F2). For the ASUS, the MoCap method was not available dueto the limited Field of View (FoV) of the sensor. [4] V. Pradeep, K. Konolige, and E. Berger,
Calibrating a Multi-arm Multi-sensor Robot: A Bundle Adjustment Approach . Berlin, Heidelberg:Springer Berlin Heidelberg, 2014, pp. 211–225.[5] F. Bl¨ochlinger, M. Bl¨osch, P. Fankhauser, M. Hutter, and R. Siegwart,“Foot-Eye Calibration of Legged Robot Kinematics,” in
Advances inCooperative Robotics (CLAWAR) , 2016, pp. 420–427.[6] M. Camurri, S. Bazeille, D. Caldwell, and C. Semini, “Real-Time Depthand Inertial Fusion for Local SLAM on Dynamic Legged Robots,” in
IEEE International Conference on Multisensor Fusion and Integrationfor Intelligent Systems (MFI) , Sep 2015, pp. 259–264.[7] J. Kelly, N. Roy, and G. S. Sukhatme, “Determining the Time DelayBetween Inertial and Visual Sensor Measurements,”
IEEE Transactionson Robotics , vol. 30, no. 6, pp. 1514–1523, Dec 2014.[8] J. Lobo and J. Dias, “Relative Pose Calibration Between Visual and In-ertial Sensors,”
The International Journal of Robotics Research , vol. 26,no. 6, pp. 561–575, 2007.[9] F. M. Mirzaei and S. I. Roumeliotis, “A Kalman Filter-Based Algorithmfor IMU-Camera Calibration: Observability Analysis and PerformanceEvaluation,”
IEEE Transactions on Robotics , vol. 24, no. 5, pp. 1143–1156, Oct 2008.[10] J. Kelly and G. Sukhatme, “Visual-Inertial Simultaneous Localization,Mapping and Sensor-to-Sensor Self-Calibration,” in
IEEE InternationalSymposium on Computational Intelligence in Robotics and Automation(CIRA) , Dec 2009, pp. 360–368.[11] P. Furgale, J. Rehder, and R. Siegwart, “Unified Temporal and Spa-tial Calibration for Multi-Sensor Systems,” in
IEEE/RSJ InternationalConference on Intelligent Robots and Systems (IROS) , Nov 2013, pp.1280–1286.[12] C. Semini, N. G. Tsagarakis, E. Guglielmino, M. Focchi, F. Cannella,and D. G. Caldwell, “Design of HyQ – a Hydraulically and ElectricallyActuated Quadruped Robot,”
Proceedings of the Institution of Mechani-cal Engineers, Part I: Journal of Systems and Control Engineering , vol.225, no. 6, pp. 831–849, 2011.[13] F. J. Romero-Ramirez, R. Muoz-Salinas, and R. Medina-Carnicer,“Speeded Up Detection of Squared Fiducial Markers,”
Image and VisionComputing , vol. 76, pp. 38–47, Aug 2018.[14] R. Hartley, J. Mangelson, L. Gan, M. G. Jadidi, J. M. Walls, R. M.Eustice, and J. W. Grizzle, “Legged Robot State-Estimation ThroughCombined Forward Kinematic and Preintegrated Contact Factors,” in