CoMo: A novel co-moving 3D camera system
Andrea Cavagna, Xiao Feng, Stefania Melillo, Leonardo Parisi, Lorena Postiglione, Pablo Villegas
11 CoMo: A novel co-moving 3D camera system
Andrea Cavagna , Xiao Feng , Stefania Melillo , Leonardo Parisi , Lorena Postiglione , Pablo Villegas Abstract —Motivated by the theoretical interest in reconstructing long D trajectories of individual birds in large flocks, we developedCoMo, a co-moving camera system of two synchronized high speed cameras coupled with rotational stages, which allow us todynamically follow the motion of a target flock. With the rotation of the cameras we overcome the limitations of standard static systemsthat restrict the duration of the collected data to the short interval of time in which targets are in the cameras common field of view, butat the same time we change in time the external parameters of the system, which have then to be calibrated frame-by-frame. Weaddress the calibration of the external parameters measuring the position of the cameras and their three angles of yaw, pitch and roll inthe system home configuration (rotational stage at an angle equal to ◦ ) and combining this static information with the time dependentrotation due to the stages. We evaluate the robustness and accuracy of the system by comparing reconstructed and measured D distances in what we call D tests, which show a relative error of the order of . The novelty of the work presented in this paper is notonly on the system itself, but also on the approach we use in the tests, which we show to be a very powerful tool in detecting and fixingcalibration inaccuracies and that, for this reason, may be relevant for a broad audience. (cid:70) NTRODUCTION I N recent years technological advances in the field ofimaging and computer vision, together with the growingdemand for D contents, contributed to make digital camerastereo systems accurate in the D reconstruction and at thesame time accessible to a wide audience. This led to theproliferation of stereo vision applications in fields as diverseas entertainment [1]–[3], surveillance [4]–[7], navigation [8]–[11], robotics [12]–[15], medicine [16]–[19] and biology [20]–[23].The experimental design of a D system is delicatebecause of the several factors that contribute to its relia-bility and feasibility, which strictly depend on the specificdata to be gathered and on the environmental and logisticconstraints of the data-acquisition location. Standard stereosystems are designed in a static fashion with the positionand the orientation of the cameras fixed in time, thus witha fixed field of view. This set-up is suitable for most of thelaboratory experiments, where the phenomena to be recon-structed happen in a confined volume, but it represents asevere limitation for non-confined field experiments.Ideally when dealing with non-confined phenomena onewould like to have a wide field of view and a high resolutionof the system. But in fact this is not possible since bothfactors depend on the cameras focal length, which needsto be short to have a large field of view and it needs to belong to have a high resolution. Therefore one has to lowerthe data-taking expectations finding a compromise betweenthe two factors, which most of the time ends up reducingboth the field of view and the resolution of the system. Asmarter, though more complicated, strategy is to replacethe static set-up with a dynamic one, effectively wideningthe field of view with a controlled rotation of the cameras • CNR–ISC (National Research Council - Institute for Complex Systems)UOS Sapienza, Rome, Italy • College of Engineering, South China Agricultural University aimed at following the targets, [24]–[30]. The dynamic set-up overcomes the limitation of the static one by actuallybreaking the link between the size of the field of view andthe resolution of the system, which is now the only factordepending on the focal length. Hence the resolution of thesystem can be set as high as needed without reducing the D volume covered by the system.The rotation of the cameras makes the external param-eters (orientation and position of the cameras in the worldreference frame) time-dependent quantities that have thento be carefully calibrated frame by frame to guarantee highaccuracy in the reconstruction of the scene. The literaturesuggests two different calibration approaches [31]: i) D methods, which reconstruct key points of calibrated D targets and estimate the external parameters as the ones thatminimize the D reconstruction error [32]–[36]; ii) D meth-ods, which match features across the cameras, reconstructthe correspondent D points that are then projected backon the cameras, and estimate the external parameters as theones that minimize the reprojection error [37]–[45]. In [46]we show that the two approaches play different roles (bothessential and not interchangeable) in the reconstruction pro-cess. D methods give the best performance in matching theimages across the cameras, i.e. in the identification of the setof D points corresponding to the same D target in differ-ent cameras, while D methods give the best performance inthe actual D reconstruction of the identified point-to-pointcorrespondences.In their standard implementations both the methodsstart from a set of correspondences, namely D point-to-point correspondences in the case of D methods and D point-to- D coordinates in the case of D methods. Thereliability of the two methods is then only guaranteed whenthe starting sets of known D or D points cover the entirefield of view. This is not problematic for D methods, wherepoint-to-point correspondences may be found all over theacquired images, but it represents a severe limitation for D methods in wide field set-up, where it is not alwayspossible to cover the entire field of view with calibrated a r X i v : . [ c s . C V ] J a n Camera hubLaptop Motion deviceTrigger Unit controller JoystickCameras and rotational stages array
Ethernet EthernetTTL SignalTTL Signal 19-pin LEMO connector USBHigh frequency nano PWM on DB25 connector
Fig. 1.
Scheme of the system.
The two IDT OS10-4K cameras (resolution px × px , sensor size . mm × . mm, frame rate fps),equipped with Schneider Xenoplan mm f/2.0 optics, are coupled with the two high-speed one-axis rotational stages (Newport RVS80CC, nominalaccuracy − rad, nominal home repeatibility · − rad). Each camera is connected with a 19-pin Lemo cable to the IDT TC-19 hub, which isalso connected to a control laptop. The cameras parameters, such as the exposure time, the sensitivity and the frame rate are manually set on theIDT proprietary software Motion Studio running on the laptop. They are sent via an Ethernet connection from the laptop to the hub, which redirectsthem to the cameras through the 19-pin Lemo cable using an IDT proprietary protocol. The hub sends to the cameras also the synch signal, whichis generated by the hub itself. The direction and the speed of rotation of the stages are manually controlled via a Joypad Logitech F310 connected,via a UBS cable, to a motion device, namely a Raspberry Pi 3 Model B+ connected to a 7” touchscreen. The motion device communicates, viaan Ethernet connection, to the unit controller, which redirects the signals to the rotational stages on a DB25 cable, in the form of a high-frequencynano PWNM. The data acquisitions starts with both the cameras and the stages in the waiting from trigger mode until they simultaneously receivethe trigger signal, a 5V TTL signal. The trigger signal is generated with a standard trigger button, and it is sent at the same time to the hub, whichredirects the signal to the cameras, and to the unit controller, which redirects the signal to the stages. targets. A third approach, which is robust with respect tothe D reconstruction accuracy regardless the size of thefield of view, consists in calibrating the external parametersof the system by directly measuring the orientation andposition of all the cameras in a common reference frame.This latter approach represents a valid alternative to the D methods described above, but it is generally not usedbecause it requires a particular care in the system set-upthat has to be specifically designed to guarantee a precisemeasurement of the external parameters.In this paper we present a novel co-moving D system,CoMo, inspired by the human ability to follow the trajectoryof a target with a coordinate movement of the eyes: camerasare coupled with rotational stages that drive a controlledrotation of all the cameras in the same direction and at thesame rotational speed, in this way dynamically adapting thefield of view to the motion of the targets.We developed and tested CoMo in the context of D data-taking of flocks of birds with cameras pointing at awide region of the sky. This makes D standard methodsfor the calibration of the external parameters not appro-priate. Therefore, for the calibration of the set of externalparameters to be used in the D reconstruction process, weadopt the direct measure approach, measuring the positionand the three angles of yaw, pitch and roll of all thecameras in a common reference frame with the techniquedescribed in Section 3.3.2 and Appendix A, while we use thestandard D method described in [20] for the calibration ofthe parameters used for the identification of point-to-pointcorrespondences across the cameras. We propose also a newprocedure to improve the standard calibration of the camerafocal length [47] which we found to be not sufficientlyaccurate for our purposes. We discuss this new procedurein Section 7.2, where we show how we could detect and fix the inaccuracy on the focal length by performing D reconstruction tests on calibrated targets.We extensively tested CoMo to evaluate its performancein terms of the D reconstruction, see Section 7.3 wherewe show that the comparison between reconstructed andmeasured D quantities on calibrated targets gives excellentresults with a D reconstruction error of the order of .With a full-fledged experimental data-taking campaignin the field we could also check the feasibility of the ex-periment with CoMo set-up, which proved to be easy tomount and easy to calibrate in the field. The data collectedin the field confirmed that with the co-moving strategy wecan actually track the flocks significantly longer than with astandard static system, as we show in Video1 of SI. RECONSTRUCTION ACCURACY REQUIRE - MENTS
The requirements on the D reconstruction accuracy arestrictly dependent on the application the data are collectedfor. We collect field data of bird flocks with the aim ofunderstanding the mechanisms behind the emergence ofcollective behaviour, and in particular we investigate thecorrelation properties of these systems [48], [49], namely wemainly use the data to measure how far (in space) and forhow long (in time) the change in the direction of flight of abird influences the change in the direction of flight of the
1. The D velocity vector, v , of a bird is given by ∆ X/ ∆ t where ∆ X is the D displacement vector of the bird in the interval of time ∆ t . Thebird direction of flight, ˆ v , is defined as the velocity versor ˆ v = v/ | v | .The change in the direction of the bird is instead given by ˆ v − ˆ V , where ˆ V is the direction of flight of the group that is computed averagingthe direction of flight of all the birds in the flock. The change in thedirection of flight is therefore computed from the distance between the D position of the bird at time t and t + ∆ t . other birds in the flock.In this framework the absolute positions of the birdsare not very useful, while the relevant quantities are thebirds’ directions of flight and the bird-to-bird distances.Therefore we need CoMo to be particularly accurate in the D reconstruction of the distances between targets. Moreprecisely, we require the relative error on the reconstructed D target-to-target distances to be: i. not dependent onthe position of the targets, to avoid a spatial bias on thequantities we compute; ii. not dependent on the instantsof time where the targets live, to avoid a temporal bias onthe quantities we compute; iii. smaller than . , which wedefine to be the threshold of the accuracy acceptability.We evaluated CoMo D reconstruction accuracy withthe tests described in detail in Section 7, showing that thesystem fulfill all the requirements above. O M O SYSTEM
In this Section we describe the hardware design of CoMo,its field set-up and the calibration procedure we developedto fulfill the D reconstruction accuracy requirements listedin Section 2. The design of CoMo is shown in Fig.1: each of the twoIDT OS10-4K cameras (resolution px × px , sensorsize . mm × . mm, frame rate fps), equipped withSchneider Xenoplan mm f/2.0 optics, is mounted on ahigh-speed one-axis rotational stage (Newport RVS80CC,nominal accuracy − rad, nominal home repeatibility · − rad). The cameras are connected to the hub IDT TC-19that has the double task of redirecting the signals from alaptop controller to the cameras and of synchronizing thecameras via a trigger and a synch signal. The rotation of the stages is manually controlled by anoperator via a motion device connected to a unit controller(XPS-RL4), which is also connected to the stages.The data acquisition procedure starts with cameras andstages in waiting for trigger mode, until they simultaneouslyreceive a signal from a hardware trigger connected to thecamera hub and to the stage unit controller. We developedtwo different motion modes for CoMo:
Offline motion mode: the speed and the direction ofrotation are set before the acquisition starts independentlyfor each stage.
Online motion mode: the speed and the direction ofrotation may be chosen online from an operator via a joypadbut they are set to be equal for all the three stages (seeAppendix B).The two different motion modes have different applica-tions: we use the offline mode when performing tests onthe system, where we need to be versatile on the camerasrotation, while we use the online mode when we collectdata in the field and it is of great importance to changethe cameras orientations in real time in order to track themoving target.
We perform experiments on bird flocks in the urban en-vironment of Rome, Italy, setting-up CoMo on the roof ofPalazzo Massimo alle Terme in front of one of the biggerand more stable birds roosting site in Rome.In this location our working distance is of about mwith a system baseline, i.e. the distance between the cam-eras, of about m. The coupling between the cameras (witha sensor size of . mm × . mm) and the optics (with afocal length of mm) produces a wide field of view of . ◦ in width and . ◦ in height. FISHING LINELEFTCAMERA RIGHTCAMERAO L O R α L α R lL FISHING LINE
Fig. 2.
Experimental set-up.
Each camera is mounted on a rotationalstage that is locked on a L-shape bar and then on a tripod. The L-barshave a gauge on their small edge (on the left side for the right cameraand on the right side for the left camera). We set the yaw angles of thecameras by tighten a fishing line, i.e. a thin nylon line, between the twoexternal edges of the bars, so that the fishing line crosses the gaugeand it can be used as a pointer on the gauge. Denoting the long side ofthe L-bar with L and the distance from the point where the line crossesthe gauge and the side of the bar with l , we can measure the yaw anglesas atan ( l/L ) , with the negative sign for the right camera and with thepositive sign for the left camera. The accuracy of the measured angle isof − rad, obtained as δl/L with δl being the thickness of the wire. Our field set-up with the a working distance of m andwith a wide field of view of . ◦ × . ◦ makes thecalibration of both the internal and external parametersparticularly tough.We calibrated the internal parameters, which describethe intrinsic characteristics of the cameras (focal length,position of the image center, distortion coefficients), and ex-ternal parameters, which define the geometry of the system(orientation and position of all the cameras with respect toa common reference frame in the three dimensional space)with two different procedures. For the calibration of the internal parameters we adopt atwo-steps procedure. In the first step, we use a standardcalibration approach. We calibrate each camera separately inthe lab using a standard calibration method based on [47]: we collect images of a × checkerboard in differentpositions, we randomly pick of these pictures and weestimate the focal length, the position of the image centerand the first order radial distortion coefficient. We iteratethis process times and we choose each parameter as themedian value obtained in the iterations.For our dynamic set-up, this standard calibration ap-proach proved to be not accurate enough, producing a timedependent D reconstruction error due to a slight mis-calibration of the focal length, which we estimated to be ofthe order of . . Therefore we designed a second step ofthe calibration, to adjust the focal length using the dynamicapproach described in detail in Section 7.2.1.Note that, because of the large working distance and ofthe large field of view, we cannot perform the standardcalibration of the internal parameters with a calibrationtarget kept at the working distance, which, at the sametime, fills the entire field of view, as this would require aplanar target of m × m . Therefore, we chose to reducethe distance of the calibration target in favor of filling thefield of view.This might be the reason for the mis-calibration of thecameras focal length obtained with the standard methodin the first step of our calibration procedure. However, ourresults are also compatible with a different scenario, whichmay be the scope of interesting future investigation: thestandard calibration approach is less sensitive than the dy-namic one to small variations of the estimated focal length;hence, these variations are boosted, and therefore moredetectable, using dynamic information. This latter scenariosuggests that the dynamic approach to the calibration maybe an efficient and relatively simple strategy to improvethe calibration performance both for static and for dynamicsystem configuration. In [46] we point out the need of two different sets ofexternal parameters: a first set to be used to match pointsacross the cameras, and a second set to be used in the D reconstruction process. For the calibration of the first set ofparameters we use a standard D calibration procedure, andwe refer the interested reader to [20], while here we focuson the calibration of the second set of parameters, i.e. theone used for the D reconstruction.Our experimental set-up, with a working distance of m and with a field of view, is not suitable to calibratethe external parameters with standard procedures. Our fieldof view is essentially a wide area of the sky, where wecannot locate any calibration D target, hence we cannotuse a D calibration method. We prefer to not use feature-based calibration routine because of their low accuracy inthe D reconstruction, which we show to be higher than in [46]. Therefore we address the calibration with a differentstrategy.CoMo external parameters are actually given by thecombination of a static term, which does not change intime and describes the initial position/orientation of thecameras, and of a dynamic term, which is time-dependentand describes the rotation of the cameras due to the stages.We directly measure these two terms separately. We initially set the rotational stages in their home posi-tion, i.e. angle of rotation equal to rad, and we set the pitchand roll angles of both cameras respectively to rad and . rad using a clinometer (RS Pro Digital level 667-3916,accuracy · − rad). We set the yaw angle of the left camera, α L , to . rad and the yaw angle of the right camera, α R ,to − . rad with a simple but effective technique, see Fig.2and Appendix A, with which we achieve an accuracy of − rad and which we extensively tested on static camerasystems, [50] and [51]. With this procedure we measure theorientation of both cameras in a common reference framebut, to define the positions of the two cameras in the real D world, we still need to fix a metric scale factor that wecalibrate by measuring the system baseline, i.e. the distancebetween the cameras, with a high precision range finder(Hilti Laser PD-E, accuracy mm).We start the data acquisition moving the cameras fromthis home calibrated configuration, recording the time-dependent angles of rotation of the stages. With a post-processing procedure we can then associate to each cameraframe the correspondent external parameters combining theexternal parameters measured in the home configuration andthe time-dependent rotation of the stages recorded duringthe data acquisition, as described in Section 4.4 and Section4.5. YNAMIC RECONSTRUCTION
There is a vast literature about D reconstruction for staticcamera systems, i.e. system with fixed cameras orientation,[33]–[36], [38], [39], [41], [42], [44], [45]. Here we move astep forward to generalize the D reconstruction theory toour dynamic system. The camera reference frame O C xyz has the origin, O C , inthe camera optical point, the z -axis directed as the opticalaxis and the xy -plane parallel to the sensor with the x -axispointing right and the y -axis pointing down, see Fig.3. Inour dynamic set-up this reference frame is not fixed in timebut it rotates on the xz -plane around the camera opticalcenter. The pinhole camera model describes the mapping betweenthe D real world and the D camera world as a centralprojection, see Fig.3: the D image, q , of the D point Q liesat the intersection between the camera sensor and the linebetween Q and the camera optical center, O C . Its naturalmathematical framework is then projective geometry, wherethe correspondence between a D point Q ≡ ( X, Y, Z ) andits D image q ≡ ( u, v ) is expressed in a very simpleformalism: q = P · Q (1)where: q = (¯ u, ¯ v, ¯ w ) is the D projective point corre-sponding to q , namely u = ¯ u/ ¯ w and v = ¯ v/ ¯ w , Q =
2. For the sake of simplicity in the manuscript we will refer to the D coordinate of an image point as defined in the image reference framewith the origin in the image center instead of the standard referencewith the origin in the top left. ZXY Q ll Q l Q ≡ (X,Y,Z)q ≡ (u,v) Oc Fig. 3.
Single camera.
The camera reference frame has the origin inthe camera optical point, O C the z -axis directed as the optical axis andthe xy -plane parallel to the sensor with the x -axis pointing right and the y -axis pointing down. The pinhole model describes the relation betweenthe D world and the D camera sensor as a central projection: theimage q of a D point lies at the intersection between the sensorand the line passing through Q and the camera optical center. Thiscorrespondence is not one-to-one, because q is not only the image ofthe point Q , but also of all the other D points belonging to the opticalline, O C Q . This ambiguity makes a single camera not sufficient for the D reconstruction. ( X, Y, Z, represents the homogeneous projective coor-dinates of Q , [52]. P is the × matrix of the form P = K · [ R | T ] , where K is the × matrix of the camerainternal parameters, R and T are respectively the × rotation matrix and the three components translation vectorthat bring the camera reference frame in the world referenceframe where Q lives, and they both depend on the externalparameters of the system.This definition of P can be further simplified by notingthat T = − R · C (2)where C is the vector that connects the origin of the worldreference frame to the origin of the camera reference frame,hence P = KR · [ I | − C ] where I denotes the × identitymatrix.In a static camera both R and C are fixed in time, butin our dynamic system the camera reference frame rotatesabout the camera optical center. Hence the vector C isconstant in time, while R ≡ R ( t ) . The time-dependent gen-eralization of the projective matrix is then straightforward: P ( t ) = KR ( t ) · [ I | − C ] (3) We denote the 2 cameras reference frames by O L x L y L z L (forthe left camera) and O R x R y R z R (for the right camera). Wedefine also a third reference frame, Oxyz , with the originon the middle point of the camera baseline, O L O R , seeFig.4, the x -axis pointing towards O R , the y -axis pointingdown along the world gravity axis and the z -axis pointingoutward following the right hand rule. This reference frameis fixed in time, and this is the reference frame within whichwe will reconstruct the scene. It is then in this referenceframe that we need to express the projective matrices of thetwo cameras. RIGHTCAMERA X R Y R O R Z R q R LEFTCAMERA X L Y L O L Z L q L QO Z XYWORLD REFERENCE FRAME yaw (α)roll (γ) pitch (β)
Fig. 4.
Camera system. O L x L y L z L and O R x R y R z R represent the leftand the right camera reference frames. Oxyz is instead the world refer-ence frame, with the origin on the middle point of the camera baseline, O L O R , the x -axis pointing towards O R , the y -axis pointing down alongthe world gravity axis and the z -axis pointing outward following the righthand rule. In this reference frame the coordinates of the two cameracenters are C L = ( − d/ , , and C R = ( d/ , , . The circle arrowsspecify the positive direction and the axis of rotation for the yaw, pitchand roll angles. As we already stated in Section 3.3.2, with our set-up theexternal parameters are the combination of a static term, thatdescribes the home configuration, and of a dynamic term,that describes the rotation due to the stages, thus the camerarotational matrices are of the form: R C = R y C ( − ϕ C ( t )) · R S ( α C , β C , γ C ) (4)where the subscript C indicates a generic camera (left orright), R y C ( − ϕ C ( t )) is the time-dependent rotation, aboutthe y -axis of the camera reference frame, which takesinto account the rotation of the stage of an angle ϕ C ( t ) , R S ( α C , β C , γ C ) is the static rotation matrix that takes intoaccount the home orientation of the camera and α C , β C and γ C are the angle of yaw, pitch and roll respectively . Inparticular, for our system R S = R z C ( − γ C ) · R x C ( − β C ) · R y C ( − α C ) (5)Note that the order of the rotations in eq.(5) is crucialand it explicitly depends on the tripod model used in theexperimental setup, see Appendix A.Note also that our choice of the world reference frame,with the origin at the center of the camera baseline andthe x -axis pointing towards the right camera, makes theexpression for the two camera centers, C L and C R , extremelyconvenient: C L = ( − d/ , , and C R = ( d/ , , with d being the length of the baseline. In the previous section we derived the expression of thecameras rotational matrices implicitly considering the time
3. In the world reference frame the x -axis is parallel to the fishingline that we use to measure the two yaw angles, α L and α R (seeFig.2), which are then automatically measured with respect to the worldreference frame. A similar argument holds also for the pitch and rollangles that we measure using a clinometer because the y -axis of theworld reference frame is parallel to the gravity direction. as a continuous variable, while in the actual experimentalset-up time is in fact measured in discrete steps, what wenormally call frames .In a standard static system the only relevant time rate isthe one of the cameras. Time discretization can then be ef-ficiently addressed by expressing all the dynamic quantitiesin the camera frame unit of time. In our dynamic systemwe have instead two time rates: the one of the camerasdefined by the cameras frame rate and the one of the ro-tational stages defined by their sampling rate. Cameras andstages discretize time with two different rates (the camerasshoot at fps and the stages gather the data at Hz),see Fig.5 where the camera and the stage sampling timesare highlighted with purple and light green dashed linesrespectively. We reconstruct the position of the targets fromthe images, hence our primary time rate is the one of thecameras. In order to perform an accurate calibration of theexternal parameters we need to match this primary time linewith the secondary time line of the stages and associate thecorrect stage position at each camera frame.
Fig. 5.
Time discretization. a.
The two time discretization of the camera(purple dashed line) and of the stages (green dashed line) have to bematched to associate the position of the stage at each camera sample,i.e. frame. In the continuous world timeline, cameras and stages receivethe trigger signal simultaneously, but due to hardware lag time they donot start to record immediately and in general not at the same time.We do not need to know the recording starting time of cameras andstages in the world timeline, but we need to measure the camera-stageoffset (red double arrow on the world timeline axis). Once we knowthis offset we can match each camera time sample, t i , with its twoclosest time samples of the stage, t j and t j +1 : t i ∈ [ t j , t j +1 ] . Thisintervals are highlighted with white and red striped boxes. b. The blacksinusoidal line represent the angle of rotation of the stage. The greencircles correspond to the time samples of the stage, where the angle isactually measured, while the purple circles correspond to the cameratime samples where we need to know the stage position. We associateto each camera time sample the angle obtained with a linear interpola-tion at time t i between the two points ( t j , ϕ ( t j )) and ( t j +1 , ϕ ( t j +1 )) . In addition to these two discretizations of time, we havealso the continuous world time line. In the world time line,cameras and stages receive the trigger signal simultaneouslybut due to hardware time lags, which are different for thecameras and for the stages, they do not start to recordimmediately and in general not at the same time. We do notneed to know the recording starting times with respect tothe world reference, but it is crucial to know the time delaybetween cameras and stages, ∆ t , highlighted with a redarrow on the world timeline in Fig.5. We measured ∆ t withthe procedure described in Section 5.1 and we estimated adelay of ms of the cameras with respect to the stages. Once this time offset is measured we can express thetime corresponding to the camera frame and the timecorresponding to the stage samples in the same reference,defining the i -th camera time as t i = ∆ t + i ∆ t C and the j -th stage time as t j = j ∆ t S , where ∆ t C = 1 / s and ∆ t S = 1 / s denote the time step of the cameras and thestages respectively.Finally we associate to the i -th camera frame, t i , its twoclosest stage samples, t j and t j +1 , such that t i ∈ [ t j , t j +1 ] see Fig.5 where these last intervals are highlighted withwhite and red striped boxes, and we define the angle ϕ C ( t i ) with a linear interpolation of the two angles ϕ C ( t j ) and ϕ C ( t j +1 ) measured by the stages. reconstruction The ambiguity of the camera projection, which associatesto the same D image all the D points lying on the sameoptical line shown in Fig.3, can be solved with two cameras,see Fig.4: if q L and q R are the images of the same point, Q , inthe left and the right camera, Q must lay on the two opticallines, one for each camera, passing through the two imagesand it is then the point at the intercept between the twolines. In a mathematical formalism this consists in solvingthe following system in the unknown Q : (cid:26) q L = P L ( t ) · Qq R = P R ( t ) · Q (6)where q L and q R are the D projective points correspondingto q L and q R and Q = ( X, Y, Z, is the D homogeneousprojective point corresponding to Q . P L ( t ) and P R ( t ) are theprojective matrices of the left and the right cameras definedas in eq.(1), each with its own calibration matrix, K L and K R , its own rotation matrix defined as in eq.(4), R L and R R ,and its own center in the world reference frame, C L and C R .In deriving eq.(6) we assumed that at each instant oftime we detect the exact position of the targets on theimages, without considering any kind of noise. The directeffect of noise is that the two lines defined by system (6)do not intersect anymore. Therefore the D reconstructedcoordinates cannot be found as the exact solution of thesystem but as its approximation, which we obtain using thestandard DLT (direct linear triangulation) method in [52].Note that in eq.(6) we identify the camera positionswith the optical centers, even though we do not knowtheir exact position. We assume that the optical centers arelocated in the same position on the camera body (exceptfor small fluctuations), because the factory design is thesame for both cameras. We assume also that they are bothlocated at the center of the camera body, which may be notcompletely correct. With this choice we may then producea mis-position of the two cameras, which in principle mayaffect the D reconstruction accuracy of the system. But theerror that we are introducing is a systematic error, i.e. equalfor both cameras, hence we may induce a systematic erroron all the D reconstructed points, namely a solid translation of the D world. This may be relevant for the accuracy onthe absolute position of the targets, but it does not affect theaccuracy on the mutual distance between pairs of targets,which is what we are interested in, as we stated in Section2. IME DISCRETIZATION : TESTS . We extensively tested the equipment to measure the time-offset ∆ t , defined in Section 4.5 and highlighted with a redarrow in Fig.5, and to check the consistency of the camerasframe rate and of the synchronization between the cameras. t(s) u ( p x ) -0.1 0 0.1 τ (s) C i t(s) -0.01500.015 ϕ (r ad ) ϕ (rad)u a)b) c) Fig. 6.
Time offset.
In order to measure the camera-stage time offset,we acquired images of 5 different targets while rotating the cameras witha periodic movement between ◦ and − ◦ . The targets are still, hencethe rotation of the cameras due to the stages produces an apparentrotation of the 2D coordinates of the targets at the same speed but inthe opposite direction. We estimate the offset from the cross-correlationbetween the signal recorded by the stage and the position of the targets. a. The evolution in time of the position of the five targets used in thetest, each highlighted in a different color. b. The signal recorded bythe rotational stage. c. The correlation function C i ( τ ) for each of thefive targets. The maximum of all the cross-correlation functions occur atthe same time, which is the offset ∆ t . In the inset the angle recorderby the stage, green line, and the position of one of the targets, purpleline, normalized to be represented on the same scale in the plot. Thecomparison between the two signals shows the apparent movement ofthe target at the same speed of the stage but in the opposite direction. We measured the time offset between the cameras andthe stages recording images of five targets ( × cardboardcheckerboards) while rotating the stages with a periodicmovement between ◦ and − ◦ , starting with the stages intheir home position.The targets are still, hence the rotation of the cameras(due to the stages) produces an apparent rotation of the2D coordinates of the targets: if a camera rotates in theclockwise direction at a certain speed, we will detect arotation of the u -coordinate of the targets with the samespeed but in the counterclockwise direction and vice versaa counterclockwise rotation of the camera corresponds to aclockwise rotation of the targets. Therefore we can estimatethe time offset comparing the signal gathered by the stageswith the position of the targets, see Fig.6a where we plotthe u -coordinates of the five targets, and Fig.6b where weplot the angle recorded by the stage as a function of time.To this aim we compute the cross-correlation of the twosignals, taking care of the following three factors: i. the twosignals are recorded with different time discretization; ii. theduration of the signals is finite in time; iii. targets positionsare not centered in .We over-sampled the signal from the cameras, i.e. the u -coordinate of the targets, with a linear interpolation. In this way we resampled the camera signal at Hz (the gath-ering frequency of the stages), so that the time resolutionof the cross correlation is defined by the time discretizationof the rotational stage. We took care of the finite durationof the two signals restricting the signal of the stage at oneperiod (from the first to the second maximum), highlightedwith a red arrow in Fig.6b. Finally we normalized the targetcoordinates by subtracting their home position, see the insetof Fig.6c where we plot the signal from the stage in lightgreen and in purple the position of one of the targets,normalized to be on the same y -scale of the stage.We define the cross-correlation between the signal of thestage and the coordinate of the i -th target as: C i ( τ ) = 1 T − τ T (cid:88) t =0 ϕ ( t )¯ u i ( t + τ ) (7)where ¯ u i is the normalized position of the i -th target. Foreach target we can define τ i as the point where C i ( τ ) reaches its maximum. We found that all the targets havethe maximum of C i ( τ ) at the same point, see Fig.6c, whichis the time offset ∆ t between the cameras and the stagesand that we estimated to be equal to ms. We checked the frame rate consistency and the synchro-nization between the cameras using a chronometer that webuilt specifically for these tests: a needle spins at a constantrotational velocity ( rps) over a protractor. Knowing the ro-tational speed of the needle, frame rate and synchronizationaccuracy are then directly measured from the angle betweenthe position of the needle in two different images.We tested the frame rate consistency for each cameraseparately, by measuring the angle span by the needlebetween two subsequent images. We found a negligibleerror, i.e. the error is below our resolution of · − scorresponding to . ◦ at a rotational speed of rps. We alsotested the synchronization between the cameras, comparingthe position of the needle on the images acquired at the sametime frame from different cameras and again we found anegligible error. AW ANGLES ACCURACY IN TIME
The accuracy of the time dependent yaw angles, ϕ L ( t ) and ϕ R ( t ) , depends on two factors, namely the rotational stagehome repeatibility and the accuracy on the interpolation weuse to compute ϕ L ( t ) and ϕ R ( t ) , described in Section 4.5.We evaluated the accuracy both on the home repeatibilityand on the interpolation on each pair camera/stage sepa-rately, by preforming the tests shown in this Section. In the rotational stage home procedure we include the initial-ization of the stage, namely we first initialize the stage andthen we move it to the home position. The unit controller of-fers also a direct procedure to home the stage from a genericposition, but we chose the indirect procedure because of itshigher consistency, see Appendix C for more details. -6·10 -5 -5 Δϕ (rad)06 P D F ReferenceHome
Fig. 7.
Home repeatibility.
The probability distribution function (PDF)obtained while acquiring images with the stage still represents the ref-erence distribution for our test. The reference distribution, highlighted inblack, gives the measure of the fluctuations due to the targets detectionroutine. The PDF of the fluctuations on the home position of the stage,highlighted in purple, obtained homing the stage after its initializationprocedure. The PDF is compatible with the reference distribution, withthe fluctuations smaller than · − rad and with a zero median value. We denote by ϕ the stage home position. We cannothave an absolute measure of ϕ , hence we measure itsfluctuation, ∆ ϕ . With the camera mounted on the stage,we collect a set of images of seven targets ( × checkerboard) acquired after the initialization and homingprocedure of the stage, namely between two consecutiveacquisitions we initialize the stage and send it to the homeposition. We detect the targets on the images with thesubpixel routine in [53] that associates to each target theposition of its central corner, and we measure the angularfluctuation of the home position within each pair of consec-utive images as the displacement of the targets, normalizedby the camera focal length Ω .To evaluate the natural fluctuations in the targets po-sitions due to the detection routine, we perform a firsttest that we will use as a reference acquiring a set of images with the stage still in the home position. We computethe probability distribution function (PDF) of the angularfluctuations, highlighted in black in Fig.7. Then we performthe actual home repeatibility test acquiring a set of images with the stage in the home position, after performingthe initialization and homing procedures, and we computethe PDF of this homing procedure fluctuations, highlightedin purple in Fig.7, which shows a zero median and valuessmaller than · − rad.The plot shows the high compatibility of the two PDFs,hence we conclude that the error on the home positionis negligible, and smaller than the one guaranteed by thefactory equal to · − rad. In order to measure the error on the interpolation of theangles recorded by the stages, we perform the followingtest: we separate cameras and stages and we stuck on each stage a × checkerboard printed on a foam board. Weput the stage in rotation, and we acquired images of therotating checkerboard keeping the camera still, starting withthe rotational stage in the home position. ϕ (r ad ) -4 P D F ϕ (r ad ) -4 P D F ϕ (r ad ) -4 ∆ϕ (rad)07000 P D F a) b)d)f)c)e) Fig. 8.
Angle accuracy. First column.
The angle gathered by the stagein the three different tests (slow, moderate and fast).
Second column.
PDFs (probability distribution function) of the error on the angle, ∆ ϕ ,defined as the difference between the interpolation of the angle mea-sured by the stage and the angle measured with Kabsch algorithm. Thefirst row refers to the slow configuration, the second row refers to themoderate configuration and the third row to the fast configuration. Asexpected the error grows with the speed, due to the decrease of theinterpolation accuracy with the speed, and in all the three cases theerror is below · − rad, being smaller than · − rad for the slowconfiguration. We use the evolution in time of the position of thecheckerboard corners to estimate the angle of rotation ofthe stage, in this way computing the rotation angle witha method that does not depend on the angular positiongathered from the stage.For each image we detect the corners of the checkerboardwith the subpixel routine in [53]. We use the first part ofthe acquisition, when the stage is still, to define a referenceposition for each corner, namely we associate to each cornerthe average of its coordinates over all the images with thestage in its home position. Then we compute the angle ofrotation of the stage corresponding to a given camera frame, t , using Kabsch algorithm [54]. More in detail, we associateto each frame t the rotational matrix that minimizes theRMSD (root mean squared deviation), computed with Kab-sch algorithm, between the positions of the corners detectedat time t and the reference positions. Finally, we comparedthe angle found with Kabsch algorithm and the angle thatwe would associate to the same camera frame interpolatingthe angular positions gathered by the stages.We carried out this test with the stages performing pe-riodic rotation in three different configurations, correspond-ing to different choices of the parameters:
1- slow. ϕ max = 2 ◦ , v max = 1 ◦ /s , a max = 0 . ◦ /s ;
2- moderate. ϕ max = 10 ◦ , v max = 10 ◦ /s , a max = 10 ◦ /s ;
3- fast. ϕ max = 18 ◦ , v max = 36 ◦ /s , a max = 72 ◦ /s .where v max and a max are the maximum speed and themaximum acceleration reached by the stages and ϕ max denote the amplitude of the periodic rotation, i.e. the stageperforms a periodic rotation between ϕ max and − ϕ max .The results of these tests are shown in Fig.8 where inthe first column we plot the angle gathered by the stagein the three different tests, and in the second column weshow the PDF (probability distribution function) of the erroron the angle, ∆ ϕ , defined as the difference between theinterpolation of the angle measured by the stage and theangle measured via Kabsch algorithm.As expected we found that the error grows with thespeed of the rotation, because of a decreasing accuracy in theinterpolation, but in all cases we found an error smaller than · − rad, being smaller than · − rad for the slowesttest, which can be considered negligible for all our practicalpurposes. YSTEM ACCURACY EVALUATION : TESTS
The question at the very core of all D reconstructionsystems is: how accurate is the system in reconstructing theposition of an object Q at a specific time t ? Answering thisquestion is not straightforward, especially if, as in our case,experiments are performed in the field where the systemcannot be mounted and calibrated once and for all. Webelieve that a fair answer can only be given by checkingreconstructed quantities against reality.This is what we actually do for our system in what wecall D tests: with a laser range finder (Hilti Laser PD-E,accuracy mm) we measure the distance between pair oftargets in the common field of view of the cameras, wereconstruct the position of the targets in our world referenceframe and from these positions we compute reconstructedtarget-to-target distances. Finally we compare reconstructedand measured distances and we compute the percentageerror on the measured distances.We perform the D tests in two different fashion: i) static D test. Cameras are set-up in their home configuration andthey do not move during the data acquisitions; ii) dynamic D test. Cameras rotate during the data acquisition. We evaluate the D reconstruction accuracy of the system,checking that the requirements described in Section 2 arefulfilled, performing the tests described in detail in Section7.2, Section 7.2.1 and Section 7.2.2. In principle we shouldperform the tests exactly in the experimental configuration:camera baseline at m, targets at a distance from thecameras in the range between m and m and pitchangles of both cameras set to . rad.But due to logistic constraints we are forced to performthe tests in a slightly different configuration: i) we set thecamera baseline at about m with targets at a distance fromthe cameras in the range between m and m; ii) we donot manage to have targets in the common field of view ofthe cameras for a pitch value of . rad, but we can achievethe maximum pitch of . rad. We take care of these twologistic limitations in the design of the test and in the dataanalysis, see Section 7.3. To evaluate the accuracy of the calibration procedures weperform D tests in a special configuration where we canwrite the explicit coordinates of the reconstructed points. Tothis aim we set pitch and roll angles of both cameras equalto , and we obtain the following explicit form of the Z coordinate of a D point, see Appendix D: Z ( t ) = Ω ds ( t ) − ( α + ϕ ( t ))Ω (8)where d is the system baseline, i.e. the distance betweenthe cameras that we measure with the laser range finder, Ω is the cameras focal length , s ( t ) = u L ( t ) − u R ( t ) isthe disparity, α = α R − α L and ϕ ( t ) = ϕ R ( t ) − ϕ L ( t ) arethe mutual orientation of the cameras due to the system home configuration and due to the rotation of the stagesrespectively.From eq.(8) we obtain the explicit expression of therelative error on Z , δZ/Z : δZ ( t ) Z = δdd + δ ΩΩ + Z Ω d [ ψ ( t ) δ Ω + Ω δψ ( t )] (9)where ψ ( t ) = α + ϕ ( t ) and δd , δ Ω and δψ denote the erroron d , Ω and ψ . Note that here we are not considering thecontribute due to the error on s ( t ) , i.e. error in the positionof the targets on the images, because it is not relevant in the D test set-up, see Appendix A.Denoting the distance between two targets as ∆ Z weobtain also that: δ (∆ Z )∆ Z = δdd + δ ΩΩ + 2 ¯ Z Ω d [ ψ ( t ) δ Ω + Ω δψ ( t )] (10)where ¯ Z is the mean Z coordinate of the two targets.We do not measure the absolute positions of the targetsbut their mutual distances, ∆ R , hence in the D test wecan only estimate the error on these distances, δ (∆ R ) . InAppendix A we show that ∆ R is proportional to ∆ Z , whichmeans that δ (∆ R ) is proportional to δ (∆ Z ) . Therefore wecan write the explicit expression of the relative error ontarget-to-target distances by substituting δ (∆ Z ) / ∆ Z with δ (∆ R ) / ∆ R in eq.(10), which gives: δ (∆ R )∆ R = δdd + δ ΩΩ + 2 ¯ Z Ω d [ ψ ( t ) δ Ω + Ω δψ ( t )] (11)Eq.(11) shows that δ (∆ R ) / ∆ R is made of a constant term,which depends on the error on d and on Ω , and of a linearterm in ¯ Z , which depends on the error on ψ and Ω .The idea now is to use the information of eq.(11) to detectpotential sources of error in the system.From the trend of δ (∆ R ) / ∆ R in ¯ Z we can make a firstdiscrimination between errors due to an incorrect measureof the baseline vs errors due to inaccuracies in Ω and ψ , aswe show in Fig.9 where we present the effect on a static D test of an error on d or of an error in α . The differencebetween the two results is evident: an error on d producesan increase of the errors constant with Z , while the error on α produces errors with a trend in Z . We stress here that it ispossible to discriminate between the two situations only if
4. For the sake of simplicity we are assuming the same value of thefocal length for both cameras
20 30Z(m)00.010.02 | δ ( ∆ R ) | / ∆ R Error on d
20 30Z(m)00.010.0200.010.02
Error on α
30 35Z(m)00.010.02 | δ ( ∆ R ) | / ∆ R a) b)c) d) Fig. 9.
Static test. The plots show | δ (∆ R ) | / ∆ R for each pair oftargets as a function of their mean distance from the cameras, ¯ Z . Theorange circles represent the result of the D test obtained with theoriginal calibration parameters. The orange dashed line is the meanvalue of | δ (∆ R ) | / ∆ R . Red circles (left column) represent the resultsobtained by manually introducing an error of . m in the baselinelength ( δd/d = 0 . ), while purple circles (right column) represent theresult obtained by manually introducing an error of . rad in the angle α . a. The error on d produces an increment of the error, constant for allthe targets. A constant fit of the data (red dashed line) gives an estimateof δd/d equal to . , compatible with the experimental δd/d = 0 . . b. The error on α produces an increment of the error linear in Z . A linearfit of the data (purple dashed line) gives a slope equal to . m − ,which corresponds to δα = 0 . rad in perfect agreement with theexperiment. c. When reducing the span of Z the error due to d is stillwell-estimated, with a constant fit that predicts an error on δd/d equalto . d. When reducing the span of Z the error due to α cannot bedetected and estimated properly. The constant fit (red dashed line) andthe linear fit (purple dashed line) are both compatible with the data. the span in Z of the targets is large enough, see Fig 9 wherein the bottom panels we show how the results would havelooked like with a short span in Z . To further discriminatebetween an error in Ω and an error in ψ ( t ) we need dynamicinformation. To this aim we derive Z with respect to timeand we obtain the following expression: ∂ t ( δZ ) = Z Ω d [ ∂ t ϕ ( t ) · δ Ω] (12)which tells us that the evolution in time of the error on Z is quadratic in Z with a coefficient that depends on therotational speed, ∂ t ϕ , and on the error on the focal length, δ Ω .In Section 3.3.1 we mentioned that we need a two-stepsprocedure for the calibration of the internal parameters,because of the low accuracy on the estimation of Ω with thestandard calibration approach. We will use this last equationto show how to detect and how to quantify the error δ Ω .Once we corrected the error on Ω we can go back to eq.(11)and check for a potential error on the cameras orientation,with the tests described in Section 7.2. We check the accuracy of the standard calibration of Ω withthe following D test: we put in rotation one camera pertime at a constant rotational speed ( v = 6 ◦ /s ). We check Ω of the left camera rotating only the left camera in theclockwise direction: ∂ t ϕ L ( t ) = v and ∂ t ϕ R ( t ) = 0 (13)while we check Ω of the right camera rotating only the rightcamera in the counterclockwise direction: ∂ t ϕ L ( t ) = 0 and ∂ t ϕ R ( t ) = − v (14)Therefore in both tests ∂ t ϕ ( t ) = ∂ t ϕ R ( t ) − ∂ t ϕ L ( t ) = − v ,and eq.(12) reads: ∂ t ( δZ ( t )) = − v Z Ω d δ Ω . (15)Note that δZ is the reconstruction error, hence δZ ( t ) = Z D ( t ) − Z where Z D is the reconstructed Z . This impliesthat ∂ t ( δZ ( t )) = ∂ t Z D ( t ) − ∂ t Z but the targets are still,hence their position is constant in time and ∂ t Z = 0 . Eq.(15)can then be written as: ∂ t ( Z D ( t )) = − v Z Ω d δ Ω (16)which tells us that the derivative of Z D with respect totime is constant for each target and it linearly depends onthe speed of rotation and on the error in Ω . We checked theevolution in time of Z D ( t ) and we found the linear trendshown Fig.10a and Fig.10f, which is also the reason for thelarge error bars of δ (∆ R ) / ∆ R in Fig.10b and Fig.10g.Eq.(16) tells us more, because it shows that ∂ t Z D ( t ) is quadratic in Z , which means that targets at differentdistances from the cameras will have a linear trend intime with different slopes: the further apart the target thehigher the slope. With a linear fit of Z D ( t ) we computed ∂ t Z D ( t ) for each target and we plot these quantities versus < Z D > t , see insets in Fig.10c and Fig.10h. From theselast plots we estimated δ Ω with a linear fit, and we foundan error of . px for the left camera ( Ω L = 6356 . pxwith the standard calibration and Ω L = 6314 . px with thedynamic calibration) and an error of . px for the rightcamera ( Ω R = 6333 . px with the standard calibration and Ω R = 6300 . px with the dynamic calibration).But we can estimate δ Ω more precisely with a differentstrategy: we run again the analysis of the D test movingthe value of Ω in the interval [5900 px, px ] and, foreach value of Ω , we compute | ∂ t Z D ( t ) | of each target. Wefound that all the targets have a well-defined minimum of | ∂ t Z D ( t ) | that occurs at the same value of Ω , see Fig.10cand Fig.10h. We choose then Ω corresponding to this min-imum as our new calibrated focal length, i.e. the dynamic Ω highlighted with a dashed orange line in Fig.10c andFig.10h. With this procedure we found an error on Ω for theleft camera equal to px and for the right camera equal to px, compatible with the estimate obtained from the linearfit of ∂ t Z D vs Z D . We checked that using this dynamic Ω , Z D does not show anymore a trend in t and we also founda reduction of the error bars for δ (∆ R ) / ∆ R , as shown inFig.10d, Fig.10e, Fig.10i and Fig.10l.We validated the dynamic calibration performing othertwo D tests in different conditions. We perform a first < Z D > t is the average in time of Z D ( t ) and it is the mostaccurate estimate of Z that we can give, since we do not measure theabsolute position of the targets but targets mutual distances. Z D - < Z D > t ( m ) Ω L (px)01 | ∂ t Z D | ( m / s ) Z D - < Z D > t ( m )
20 30Z(m)00.01 | δ ( ∆ R ) | / ∆ R StandardDynamic
20 30Z(m)00.01 | δ ( ∆ R ) | / ∆ R Standard
20 30Z(m)00.01 | δ ( ∆ R ) | / ∆ R Dynamic Z D - < Z D > t ( m ) Ω R (px)01 | ∂ t Z D | ( m / s ) > t (m )0-0.1 ∂ t Z D ( m / s ) Z D - < Z D > t ( m )
20 30Z(m)00.01 | δ ( ∆ R ) | / ∆ R
20 30Z(m)00.01 | δ ( ∆ R ) | / ∆ R
20 30Z(m)00.01 | δ ( ∆ R ) | / ∆ R > t (m )0-0.1 ∂ t Z D ( m / s ) a)b) c) d)e)f)g) h) i)l) m)n)Focal Length Calibration Validation ∂ t ϕ =12 ° /s ∂ t ϕ =0 ° /s L E FT CA M E RAR I G H T CA M E RA Fig. 10.
Improving the focal length calibration.
In the left box data refer to the calibration improvement procedure, while on the right box to itsvalidation.
Left box.
The top part refers to the calibration of the left camera and the bottom part to the calibration of the right camera. Data arecollected with a dynamic D test with one camera per time in rotation at a constant speed v = 6 ◦ /s . In the first column we show the results of the D test obtained with the standard Ω , i.e. Ω calibrated with the standard method. In the right column we show the same quantities but obtainedwith the dynamic Ω , i.e. Ω calibrated with the dynamic procedure. a. d. f. and i. Plots show the reconstructed Z , Z D ( t ) , for all the targets, eachhighlighted with a different color. Z D ( t ) is normalized by its mean in time to have all the targets on the same range. a. and f. Standard Ω : Z ( t ) shows a linear trend in t . d. and i. Dynamic Ω : Z ( t ) does not show any trend in t . b. e. g. and l. Plots show the mean in time of δ (∆ R ) / ∆ R for eachpair of targets as a function of the pairs mean distance from the cameras, ¯ Z . Error bars are computed as standard deviation. b. and g. Standard Ω :large error bars reflect the high variability of the targets Z ( t ) due to their linear trend in t . d. and l. Dynamic Ω : error bars are in most of the casessmaller than the symbols and they reflect the absence of the trend in time of the targets Z ( t ) . c. and h. The plots show the absolute value of theslope of the reconstructed Z , | ∂ t Z D ( t ) | , as a function of Ω . At a fixed value of Ω the slope increases with the target distance from the camera,which is embedded in the color code, going from light purple and light blue for the closest target to dark purple and dark blue for the furthest. Allthe targets present a well-defined minimum of the slope for the same value of Ω , highlighted with an orange dashed line, which corresponds to thedynamic Ω , while the standard Ω is highlighted with the red dashed line. In the inset we show the linear trend of ∂ t Z D with the average of Z D in time, < Z D > t for the standard Ω . Right box.
We validate the dynamic calibration comparing the absolute value of the relative error in thetarget-to-target distances using the focal length obtained with the standard (red circles) and dynamic (orange circles) calibration. m. We tested thedynamic calibration with a dynamic D test rotating both cameras simultaneously at a speed of ◦ /s in the two opposite directions. The plot showsthat with the dynamic calibration we obtain smaller relative errors and much smaller error bars than with the standard calibration. Moreover we seethat the trend in Z that is quite evident for the standard calibration, becomes negligible with the dynamic calibration. n. We validate the dynamiccalibration on a D test reproducing our experimental procedure, with both cameras rotating simultaneously and in the same direction. Here wedo not appreciate a decrease of the error bars, because the effect of δ Ω is negligible due to the effective speed v = 0 ◦ /s , but we still see that theoverall errors get smaller. test rotating both the cameras simultaneously at a constantspeed of ◦ /s but in opposite directions, in this way am-plifying a potential error on Ω : we rotate the left camera inthe clockwise direction, ∂ t ϕ L ( t ) = − v and the right camerain the counterclockwise direction, ∂ t ϕ R ( t ) = v , hence theeffective rotational speed ∂ t ϕ is equal to ◦ /s . We alsoperformed a second test to simulate the experimental set-up,thus rotating the cameras in the same direction at the samespeed, ∂ t ϕ L ( t ) = ∂ t ϕ R ( t ) = v with an effective rotationalspeed ∂ t ϕ ( t ) = 0 ◦ /s .The results of these two tests are shown in Fig.10mand Fig.10n, where the red circles refer to the standardcalibration and the orange circles to the dynamic calibration.As expected the effect of the standard Ω is more evident inthe test at ∂ t ϕ ( t ) = 12 ◦ /s where we see large error bars of | δ (∆ R ) | / ∆ R and also a trend with ¯ Z , while for v = 0 ◦ /s error bars are quite small. In both tests the dynamic Ω reduces | δ (∆ R ) | / ∆ R and it makes the error bars for thetest at v = 12 ◦ /s comparable with the ones of the testat ∂ t ϕ ( t ) = 0 ◦ /s . These two factors, lower | δ (∆ R ) | / ∆ R and smaller error bars, confirm that the dynamic Ω is morecorrect than the one obtained with the standard calibration.From these tests we learn that for an accurate calibrationof the internal parameters, we need first to perform the stan-dard calibration procedure described in Section 3.3.1 andthen we need to perform two dynamic D tests, each withonly one camera per time in rotation at a constant speed.From the linear fit of ∂ t Z D ( t ) versus < Z D > t we estimatethe error on the focal length of the two cameras, which weuse to correct the results obtained with the standard cali-bration approach. With this two-steps calibration procedure,we fulfill the requirement on the time independence of thereconstruction error at the relatively low cost of performingtwo dynamic D tests, namely few hours of work.
20 30Z(m)00.01 | δ ( ∆ R ) | / ∆ R Test1Test2
20 3000.01 | δ ( ∆ R ) | / ∆ R β =0 rad 20 3000.01 β =0.08 rad 20 3000.01 β =0.015 rad20 30Z(m)00.01 | δ ( ∆ R ) | / ∆ R
20 30Z(m)00.01 20 30Z(m)00.01 S T A T I C D T ES T D Y NA M I C D T ES T ALIGNMENT CONSISTENCY EXPERIMENTAL ACCURACY a) b)e) f)c) d)g)
Fig. 11.
System accuracy. Left box.
The orange and the green boxplots represent the relative error in the D reconstruction as a function of ¯ Z for two different sets of static D tests. Data from the two sets are collected mounting and unmounting the entire system, which justify differencesin the ¯ Z values for the two tests. Data within the same set are collected by repeating the alignment procedure with the fishing line ( times for theorange set and times for the purple one). The line inside the box correspond to the median of the relative error in the D reconstruction for asingle target-to-target distance, the two edges of the box correspond to the first and the third quartiles, the two whiskers correspond to the minimumand the maximum value. The plot show no trend in ¯ Z , hence showing a not appreciable error on the angle α . The data show variability within thesame test (quite large error bars), due to the alignment, and also a variability within the two different tests, due to the set-up procedure, but thisdoes not affect the accuracy and the consistency of the D reconstruction that gives always relative errors smaller than . . Right box.
Datapresented in the first and the second row are collected performing respectively static and dynamic D tests for different values of β . The dynamictests are performed in the field configuration with the cameras rotating simultaneously at the same speed and in the same direction. The plots show | δ (∆ R ) | / ∆ R for each pair of targets as a function of their mean distance from the cameras, ¯ Z . Static tests are performed shooting one singleimage, hence we do not have error bars. For the dynamic tests instead, we plot the | δ (∆ R ) | / ∆ R averaged in time and error bars, which are mostof the times smaller than the symbols, represent standard deviation. We do not see any trend of the error with β nor in the static tests nor in thedynamic tests. The comparison between static and dynamic tests at a fixed value of β show relative errors of the same order and always smallerthan . . Field experiments are often performed in locations wherethe apparatus cannot be mounted once and for all as ithappens for our experiment, which is carried out on the roofof a building where we are forced to mount and unmountthe entire system on a daily basis. It is then important todesign an easy-to-mount system and a consistent calibrationprocedure. We tested CoMo to evaluate our consistency inthe mounting procedure and in the alignment of the cameraswith the fishing line, described in Fig.2.To this aim we performed two sets of static D testmounting and unmonting the entire system between thetwo. In each set we repeat several times the alignmentprocedure taking at every alignment a static picture of thetargets. We then reconstructed the position of the targets,we computed target-to-targets distances and δ (∆ R ) / ∆ R ,Finally we evaluate the variability of the reconstructionerror within each set of data and between the two sets.The results of this test are shown in Fig.11a, where weshow the relative error, ∂ (∆ R ) / ∆ r , of each pair of targetsas a function of ¯ Z . The plot shows variability within thesame test, which is due to the alignment procedure, andvariability within different tests, but with relative errorsalways below 0.012. The absence in both tests of a trend in ¯ Z shows that inaccuracies in the calibration of α are negligibleand that the alignment technique is consistent, while theupper limit of . of the reconstruction error shows theconsistency of our mounting procedure. reconstruction accuracy in field set-up We evaluate the D reconstruction accuracy of the systemperforming again D tests, but this time with a set-up as similar as possible to the experimental one. In principlewe should perform this D test exactly in the experimentalconfiguration: camera baseline at m, targets at a distancefrom the cameras in the range between m and m andpitch angles of both cameras set to . rad.But due to logistic constraints we are forced to performthe tests in a slightly different configuration: i) we set thecamera baseline at about m with targets at a distance fromthe cameras in the range between m and m; ii) we donot manage to have targets in the common field of view ofthe cameras for a pitch value of . rad, but we can achievethe maximum pitch of . rad.We take care of these two logistic limitations in thedesign of the test and in the data analysis. In particular: i) in the D test the ratio Z/d is between and , whilethis ratio in the field is between and . The factor Z/d is relevant when we find a trend with ¯ Z in δ (∆ R ) / ∆ R inwhich case, to estimate the experimental error, we have torenormalize the D reconstruction error found in the testsby a factor ; ii) we perform three series of tests at differentpitch angles: β = 0 rad, β = 0 . rad and β = 0 . rad todetect a potential trend of the error with β and in this caseto predict the range of the reconstruction error in the fieldconditions, i.e. β = 0 . rad.We perform the D test in the following way: for eachof the three pitch values we perform first a static D testin the home configuration and then we put both camerasin rotation as in the field, namely both cameras rotate inthe same direction and with the same rotational speed. Weperform the test rotating the cameras at a constant speed of ◦ /s (0 . rad/s ) , which is the maximum speed we use in thefield.The results are shown in the right box of Fig.11, where in the first row we plot the relative error for the three statictests and in the second row the results of the dynamic D tests. In both cases we have excellent results with relativeerrors smaller than . , without any trend in Z and we didnot find any trend of the error with β . We do not have hereto renormalize the relative error to take care of the differentvalue of Z/d in the test and in the field because we do notsee any trend of the relative error in Z , nor we need to makeany prediction of the error at β = 0 . rad because there arenot appreciable differences of the errors for different β .The results of the static D test at β = 0 essentiallyreflects the accuracy of the cameras alignment procedure.The comparison between static tests at different values of β shows that the introduction of a non-zero pitch angleproduces a negligible error, because with different valuesof β we obtain errors of the same order. With a similarargument, the comparison between static and dynamic testsshows that the introduction of the rotation due to the stagesdoes not add affect the accuracy of the external parameterscalibration. Therefore the D tests shows that the dominantsource of error on the external parameters calibration is thealignment technique and in particular on the measurementof the cameras yaw angles. From the results of the D testswe estimated this angular error to be smaller than . rad,hence confirming the high-precision of our alignment pro-cedure. ONCLUSIONS
We presented a novel co-moving camera stereo system,CoMo, developed in the context of D tracking of largegroups of targets moving in a wide and non-confined space.To overcome the limitation of standard static set-up, wherethe size of the field of view is defined by the fixed positionof the cameras and in most of the cases narrowed to achievea sufficient resolution of the system, we designed CoMoto follow the motion of the targets with a controlled andsynchronized rotation of the cameras driven by rotationalstages (one for each camera).The D reconstruction for a dynamic and wide fieldsystem is rather demanding because the external parametersof the system have to be calibrated frame-by-frame and theycannot be calibrated with standard methods, which are notaccurately enough on wide field data. We propose a noveltechnique for the calibration of the external parametersthat separates their static component, corresponding to thesystem in the home configuration (rotational stages at the ◦ position), from their dynamic component, correspondingto the rotation due to the stages. We calibrate the staticcomponent of the external parameters by measuring theposition and the three angles of yaw, pitch and roll of thecameras in a common reference frame, and we combine thisinformation with the frame-by-frame rotation gathered fromthe stages.We validated this calibration approach performing whatwe call D tests : we set-up the system, we acquire imagesof a set of still targets, and we accurately measure with alaser distometer the distance between each pair of target.From the collected images we reconstruct the position ofthe targets and we compute their mutual distances thatwe compare with the measure ones. The results of the D tests show the consistency of the calibration method for theexternal parameters and the high accuracy of the system( D reconstruction error below ). D tests represent a fair and objective method to evalu-ate the accuracy of a D system but the very relevance of the D tests is in the designing phase of a D system because,as we showed in the manuscript, D tests are a powerfultool to detect potential sources of errors also providing awell-defined procedure to discriminate errors due to anincorrect measurement of the cameras position vs errorsdue to an incorrect measurement of the cameras orientation.Finally, D tests are at the basis of the new method thatwe proposed to improve the standard calibration of thefocal length, which we could found to be inaccurate byperforming dynamic D tests and noting an unexpectedtrend of the reconstructed position with time.We carried a first experimental campaign using CoMoto collect data on starling flocks, which are an emblematicexample of targets moving in large groups in a non-confinedspace. To this aim we set-up the apparatus on the roofof Palazzo Massimo alle Terme, where we are forced tomount and unmount the system every day. With this firstexperimental campaign we proved that the system is easy tomount and easy to calibrate and confirmed that the design ofCoMo considerably expand the time-length of the acquireddata. A CKNOWLEDGMENTS
We thank Zachary Stamler for the fruitful and stimulatingdiscussion about the calibration of the external parametersof the system. This work was supported by ERC grantRG.BIO (Grant No. 785932). R EFERENCES [1] T. Bebie and H. Bieri, “A video-based 3d-reconstruction of soccergames,” Computer Graphics Forum, vol. 19, pp. 391 – 400, 09 2000.[2] O. Grau, G. A. Thomas, A. Hilton, J. Kilner, and J. Starck, “Arobust free-viewpoint video system for sport scenes,” in 20073DTV Conference, 2007, pp. 1–4.[3] A. Kulshreshth, J. Schild, and J. J. LaViola Jr, “Evaluating userperformance in 3d stereo and motion enabled video games,” inProceedings of the international conference on the Foundations ofDigital Games, 2012, pp. 33–40.[4] S. Fleck, F. Busch, P. Biber, and W. Straber, “3d surveillance adistributed network of smart cameras for real-time tracking andits visualization in 3d,” in 2006 Conference on Computer Visionand Pattern Recognition Workshop (CVPRW’06), 2006, pp. 118–118.[5] S.-I. Yu, Y. Yang, X. Li, and A. G. Hauptmann, “Long-term identity-aware multi-person tracking for surveillance video summariza-tion,” arXiv preprint arXiv:1604.07468, 2016.[6] P. Michel, J. Chestnutt, S. Kagami, K. Nishiwaki, J. Kuffner, andT. Kanade, “Gpu-accelerated real-time 3d tracking for humanoidlocomotion and stair climbing,” in 2007 IEEE/RSJ InternationalConference on Intelligent Robots and Systems. IEEE, 2007, pp.463–469.[7] L. Wen, Z. Lei, M.-C. Chang, H. Qi, and S. Lyu, “Multi-camera multi-target tracking with space-time-view hyper-graph,”International Journal of Computer Vision, vol. 122, no. 2, pp. 313–333, 2017.[8] J. Haeling, M. Necker, and A. Schilling, “Dense urban scenereconstruction using stereo depth image triangulation,” in Proc.SPIE 11433, Twelfth International Conference on Machine Vision,(ICMV 2019),, 01 2020, p. 21.[9] D. Murray and J. J. Little, “Using real-time stereo vision for mobilerobot navigation,” autonomous robots, vol. 8, no. 2, pp. 161–171,2000. [10] A. Broggi, C. Caraffi, R. I. Fedriga, and P. Grisleri, “Obstacledetection with stereo vision for off-road vehicle navigation,” in2005 IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR’05)-Workshops. IEEE, 2005, pp. 65–65.[11] K. Konolige, M. Agrawal, R. C. Bolles, C. Cowan, M. Fischler, andB. Gerkey, “Outdoor mapping and navigation using stereo vision,”in Experimental Robotics. Springer, 2008, pp. 179–190.[12] M. Bitzidou, D. Chrysostomou, and A. Gasteratos, “Multi-camera3d object reconstruction for industrial automation,” vol. 397, 092012, pp. 1 – 8.[13] S. Ghosh and J. Biswas, “Joint perception and planning for effi-cient obstacle avoidance using stereo vision,” in 2017 IEEE/RSJInternational Conference on Intelligent Robots and Systems(IROS), 2017, pp. 1026–1031.[14] R. Usamentiaga and D. Garcia, “Multi-camera calibration foraccurate geometric measurements in industrial environments,”Measurement, vol. 134, pp. 345 – 358, 2019.[15] K. Schmid, T. Tomic, F. Ruess, H. Hirschm ¨uller, and M. Suppa,“Stereo vision based indoor/outdoor navigation for flyingrobots,” in 2013 IEEE/RSJ International Conference on IntelligentRobots and Systems. IEEE, 2013, pp. 3955–3962.[16] F. Marreiros, S. Rossitti, P. Karlsson, C. Wang, T. Gustafsson,P. Carleberg, and O. Smedby, “Superficial vessel reconstructionwith a multiview camera system,” Journal of Medical Imaging,vol. 3, 01 2016.[17] H. Fernandes, P. Costa, V. Filipe, L. Hadjileontiadis, and J. Barroso,“Stereo vision in blind navigation assistance,” in 2010 WorldAutomation Congress. IEEE, 2010, pp. 1–6.[18] C. Bert, K. G. Metheany, K. Doppke, and G. T. Chen, “A phantomevaluation of a stereo-vision surface imaging system for radiother-apy patient setup,” Medical physics, vol. 32, no. 9, pp. 2753–2762,2005.[19] T. Probst, K.-K. Maninis, A. Chhatkuli, M. Ourak, E. Van-der Poorten, and L. Van Gool, “Automatic tool landmark detectionfor stereo vision in robot-assisted retinal surgery,” IEEE Roboticsand Automation Letters, vol. 3, no. 1, pp. 612–619, 2017.[20] A. Attanasi, A. Cavagna, L. Del Castello, I. Giardina, A. Jelic,S. Melillo, L. Parisi, F. Pellacini, E. Shen, E. Silvestri et al.,“Greta - a novel global and recursive tracking algorithm in threedimensions,” Pattern Analysis and Machine Intelligence, IEEETransactions on, vol. 99, 2015.[21] D. H. Theriault, N. W. Fuller, B. E. Jackson, E. Bluhm, D. Evan-gelista, Z. Wu, M. Betke, and T. L. Hedrick, “A protocol andcalibration method for accurate multi-camera field videography,”Journal of Experimental Biology, vol. 217, no. 11, pp. 1843–1848,2014.[22] I. Watts, M. Nagy, R. I. Holbrook, D. Biro, and T. Burt de Per-era, “Validating two-dimensional leadership models on three-dimensionally structured fish schools,” Royal Society openscience, vol. 4, no. 1, p. 160804, 2017.[23] X. E. Cheng, Z.-M. Qian, S. H. Wang, N. Jiang, A. Guo, andY. Q. Chen, “A novel method for tracking individuals of fruit flyswarms flying in a laboratory flight arena,” PloS one, vol. 10, no. 6,p. e0129657, 2015.[24] H. Zou, Z. Gong, S. Xie, and W. Ding, “A pan-tilt camera controlsystem of uav visual tracking based on biomimetic eye,” in 2006IEEE International Conference on Robotics and Biomimetics, 2006,pp. 1477–1482.[25] K. Fujimura, Y. Hyodo, and S. Kamijo, “Pedestrian trackingacross panning camera network,” in 2009 12th International IEEEConference on Intelligent Transportation Systems, 2009, pp. 1–6.[26] Hongkai Chen, Xiaoguang Zhao, and Min Tan, “A novel pan-tiltcamera control approach for visual tracking,” in Proceeding of the11th World Congress on Intelligent Control and Automation, 2014,pp. 2860–2865.[27] K. Zhao, U. Iurgel, M. Meuter, and J. Pauli, “An automaticonline camera calibration system for vehicular applications,” in17th International IEEE Conference on Intelligent TransportationSystems (ITSC), 2014, pp. 1490–1492.[28] M. S. Al-Hadrusi, N. J. Sarhan, and S. G. Davani, “A clusteringapproach for controlling ptz cameras in automated video surveil-lance,” in 2016 IEEE International Symposium on Multimedia(ISM), 2016, pp. 333–336.[29] D. D. Doyle, A. L. Jennings, and J. T. Black, “Optical flow back-ground estimation for real-time pan/tilt camera object tracking,”Measurement, vol. 48, pp. 195 – 207, 2014. [30] R. Stolkin, A. Greig, and J. Gilby, “A calibration system formeasuring 3d ground truth for validation and error analysis ofrobot vision algorithms,” Measurement Science and Technology,vol. 17, no. 10, pp. 2721–2730, aug 2006.[31] J. Salvi and X. Armangu´e Quintana, “A comparative review ofcamera calibrating methods with accuracy evaluation[j],” PatternRecognition, vol. 35, pp. 1617–1635, 07 2002.[32] S. Fry, M. Bichsel, P. M ¨uller, and D. Robert, “Tracking of flyinginsects using pan-tilt cameras,” Journal of Neuroscience Methods,vol. 101, no. 1, pp. 59 – 67, 2000.[33] Z. Liu, F. Li, X. Li, and G. Zhang, “A novel and accurate calibrationmethod for cameras with large field of view using combined smalltargets,” Measurement, vol. 64, pp. 1 – 16, 2015.[34] F. Gu, H. Zhao, Y. Ma, P. Bu, and Z. Zhao, “Calibration of stereorigs based on the backward projection process,” MeasurementScience and Technology, vol. 27, no. 8, p. 085007, jul 2016.[35] B. Shan, W. Yuan, and Z. Xue, “A calibration method for stereovi-sion system based on solid circle target,” Measurement, vol. 132,pp. 213 – 223, 2019.[36] J. Chaochuan, Y. Ting, W. Chuanjiang, F. Binghui, and H. Fugui,“An extrinsic calibration method for multiple RGB-d cameras ina limited field of view,” Measurement Science and Technology,vol. 31, no. 4, p. 045901, jan 2020.[37] Davis and Chen, “Calibrating pan-tilt cameras in wide-areasurveillance networks,” in Proceedings Ninth IEEE InternationalConference on Computer Vision, 2003, pp. 144–149 vol.1.[38] M. Machacek, M. Sauter, and T. R. sgen, “Two-step calibrationof a stereo camera system for measurements in large volumes,”Measurement Science and Technology, vol. 14, no. 9, pp. 1631–1639, jul 2003.[39] Z. Song and R. Chung, “Use of lcd panel for calibratingstructured-light-based range sensing system,” IEEE Transactionson Instrumentation and Measurement, vol. 57, no. 11, pp. 2623–2630, 2008.[40] Z. Wu and R. J. Radke, “Using scene features to improve wide-areavideo surveillance,” in 2012 IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops, 2012, pp.50–57.[41] Z. Wang, Z. Wu, X. Zhen, R. Yang, J. Xi, and X. Chen, “A two-stepcalibration method of a large fov binocular stereovision sensor foronsite measurement,” Measurement, vol. 62, pp. 15 – 24, 2015.[42] P. Cornic, C. Illoul, A. Cheminet, G. L. Besnerais, F. Champag-nat, Y. L. Sant, and B. Leclaire, “Another look at volume self-calibration: calibration and self-calibration within a pinhole modelof scheimpflug cameras,” Measurement Science and Technology,vol. 27, no. 9, p. 094004, aug 2016.[43] Y. Wang, X. Wang, Z. Wan, and J. Zhang, “A method for extrinsicparameter calibration of rotating binocular stereo vision using asingle feature point,” Sensors (Basel, Switzerland), vol. 18, 2018.[44] J. Zhang, H. Yu, H. Deng, Z. Chai, M. Ma, and X. Zhong, “A robustand rapid camera calibration method by one captured image,”IEEE Transactions on Instrumentation and Measurement, vol. 68,no. 10, pp. 4112–4121, 2019.[45] N. Machicoane, A. Aliseda, R. Volk, and M. Bourgoin, “A sim-plified and versatile calibration method for multi-camera opticalsystems in 3d particle imaging,” Review of Scientific Instruments,vol. 90, no. 3, p. 035112, 2019.[46] R. Beschi, X. Feng, S. Melillo, L. Parisi, and L. Postiglione,“Stereo camera system calibration: the need of two setsof parameters,” Arxiv.org, 2021. [Online]. Available: https://arxiv.org/abs/2101.05725[47] K. H. Strobl and G. Hirzinger, “More accurate pinhole camera cal-ibration with imperfect planar target,” in 2011 IEEE InternationalConference on Computer Vision Workshops (ICCV Workshops),2011, pp. 1068–1075.[48] A. Attanasi, A. Cavagna, L. Del Castello, I. Giardina, T. S. Grigera,A. Jeli´c, S. Melillo, L. Parisi, O. Pohl, E. Shen et al., “Informationtransfer and behavioural inertia in starling flocks,” Nature physics,vol. 10, no. 9, pp. 691–696, 2014.[49] A. Cavagna, L. Del Castello, S. Dey, I. Giardina, S. Melillo, L. Parisi,and M. Viale, “Short-range interactions versus long-range correla-tions in bird flocks,” Physical Review E, vol. 92, no. 1, p. 012705,2015.[50] A. Cavagna, I. Giardina, A. Orlandi, G. Parisi, A. Procaccini,M. Viale, and V. Zdravkovic, “The starflag handbook on collectiveanimal behaviour: Part i, empirical methods,” Animal Behaviour -ANIM BEHAV, vol. 76, pp. 217–236, 07 2008.5