Rigid and non-rigid motion compensation in weight-bearing cone-beam CT of the knee using (noisy) inertial measurements
Jennifer Maier, Marlies Nitschke, Jang-Hwan Choi, Garry Gold, Rebecca Fahrig, Bjoern M. Eskofier, Andreas Maier
RRigid and non-rigid motion compensation in weight-bearingcone-beam CT of the knee using (noisy) inertial measurements
Jennifer Maier ∗ , Marlies Nitschke , Jang-Hwan Choi , Garry Gold , Rebecca Fahrig ,Bjoern M. Eskofier , and Andreas Maier Pattern Recognition Lab, Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany Machine Learning and Data Analytics Lab, Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany Division of Mechanical and Biomedical Engineering, Ewha Womans University, Seoul, Republic of Korea Department of Radiology, School of Medicine, Stanford University, Stanford, California, USA Innovation, Advanced Therapies, Siemens Healthcare GmbH, Forchheim, Germany
Abstract
Involuntary subject motion is the main source of artifacts in weight-bearing cone-beam CT of the knee.To achieve image quality for clinical diagnosis, the motion needs to be compensated. We propose to useinertial measurement units (IMUs) attached to the leg for motion estimation. We perform a simulation studyusing real motion recorded with an optical tracking system. Three IMU-based correction approaches areevaluated, namely rigid motion correction, non-rigid 2D projection deformation and non-rigid 3D dynamicreconstruction. We present an initialization process based on the system geometry. With an IMU noisesimulation, we investigate the applicability of the proposed methods in real applications. All proposed IMU-based approaches correct motion at least as good as a state-of-the-art marker-based approach. The structuralsimilarity index and the root mean squared error between motion-free and motion corrected volumes areimproved by 24-35% and 78-85%, respectively, compared with the uncorrected case. The noise analysisshows that the noise levels of commercially available IMUs need to be improved by a factor of 10 which iscurrently only achieved by specialized hardware not robust enough for the application. The presented studyconfirms the feasibility of this novel approach and defines improvements necessary for a real application. Keywords:
CT reconstruction, Inertial measurements, Non-rigid motion compensation, Signal noise
Patients suffering from osteoarthritis (OA) experience joint pain due to an increased porosity of articular carti-lage or even a complete loss of tissue [1]. This pain is especially severe in the frequently loaded knee joint. Sincecartilage degeneration also affects the mechanical properties of the tissue, an analysis of the behavior underload can help to understand and analyze diseased tissue [24]. Contrast-agent enhanced CT-imaging in a weight-bearing standing position is an established method to visualize the articular cartilage in the knee joint [8, 9].It is realized by using a flexible robotic C-arm system scanning on a horizontal trajectory around the patient’sknees [19]. The setup of such a scan can be seen in Fig. 1a. One drawback of this setting is that subjects tendto have a higher range of motion when standing compared with the supine lying position, which leads to imageartifacts like blurring and double edges in the resultant reconstructions. Motion corrupted reconstructions losetheir diagnostic value and are unsuitable for further processing. Since it is not useful to prevent subject motionwhen aiming for natural stance, the movement during the scan has to be estimated and corrected.Previous approaches are either image-based or use an external signal or marker in order to correct for motion.Performing 2D/3D registration showed very good motion compensation capabilities, but required prior bonesegmentations and is computationally expensive [2]. The same limitation holds for an approach based on apenalized image sharpness criterion [28]. By leveraging the epipolar consistency of CT scans, the translationbut not the rotation of the knees during a CT scan was estimated [3]. Bier et al. [4] proposed to estimatemotion by tracking anatomical landmarks in the projection images using a neural network. Until now, their ∗ [email protected] a r X i v : . [ ee ss . I V ] F e b a) (b) Figure 1: (a) Setup of a weight-bearing C-arm cone-beam CT scan of the knees [22], (b) Biomechanical modelwith virtual reflective markers on the legs (pink spheres) and IMUs on thigh and shin (green boxes).approach was not applied for motion compensation and was only reliable if there were no other objects present.An investigation on the practicality of using range cameras for motion compensated reconstruction showedpromising results on simulated data [5]. An established and effective method for motion compensation inweight-bearing imaging of the knee is based on small metallic markers attached to the leg and tracked in theprojection images to iteratively estimate 3D motion [8, 9]. However, the process of placing the markers is tediousand they produce metal artifacts in areas of interest in the resulting images.In C-arm CT, inertial measurement units (IMUs) containing an accelerometer and a gyroscope have untilnow been applied for navigation [15] and calibration [18] purposes. Our recent work was the first to proposethe use of IMUs for motion compensation in weight-bearing cone-beam CT (CBCT) of the knee [22]. Weevaluated the feasibility of using the measurements of one IMU placed on the shin of the subject for rigidmotion compensation in a simulation study. However, since the actual movement during the scan is non-rigid,not all artifacts could be resolved with the rigid correction approach. For this reason, we now investigate non-rigid motion compensation based on 2D or 3D deformation using signals recorded by two IMUs placed on theshin and the thigh. Furthermore, a method to estimate the initial pose and velocity of the sensors is presented.These two parameters are needed for motion estimation and were assumed to be known in the aforementionedpublication [22]. Another drawback of our previous publication is that we only simulated optimal IMU signalsand neglected possible measurement errors. In order to assess the applicability of our proposed methods in areal setting, and as a third contribution, we now analyze how sensor noise added to the optimal IMU signalsinfluences the motion compensation capabilities.In this article, we present a simulation study similar to the one in our previous publication, therefore somecontent of section 2.1 is closely related to Maier et al. [22]. Furthermore, the previously published rigid motionestimation approach is repeated for better comprehensibility.
The whole processing pipeline of the presented simulation study is shown in Fig. 2, where black font describeseach processing step and green font the respective output. All steps shaded in gray relate to the simulation andare described in Section 2.1, while all steps shaded in green refer to the proposed data processing presented inSections 2.2 to 2.6.The simulation contains the following steps: The motion of standing subjects is recorded with an opticalmotion capture system and used to animate a biomechanical model to obtain the trajectories of hip, knees andankles (2.1.1). These positions are then used in two ways: First, the lower body of a numerical phantom isdeformed to mimic the subject motion and a motion-corrupted C-arm CBCT scan is simulated (2.1.2). Secondly,the signals of IMUs placed on the model’s leg are computed (2.1.3).In Section 2.2, measurement noise is added to the optimal sensor signals. These noisy signals are later usedto analyze the influence of measurement errors on the motion correction with IMUs.Then, the proposed IMU-based approaches for motion compensated reconstruction of the motion-corruptedCT scan are described: From the IMU measurements, the position and orientation, i.e. the pose, of the IMUsover time are computed (2.3). For this step, the initial sensor pose and velocity need to be known and are2 .1.1
Data AcquisitionReflective Marker Motion
Biomechanical Model and Inverse KinematicsTrajectories of Hip, Knee & Ankle
IMU SimulationAcceleration &Angular Velocity of IMU
XCAT Lower Body Spline DeformationDeformed XCAT Model
X-Ray Forward ProjectionProjection Images
Transformation
Algorithm
IMU Poses
InitializationInitial IMU Pose & VelocityMotion-compensated ReconstructionMotion-corrected Volume
Rigid Projection Matrix Correction
2D Projection Deformation
3D Dynamic Reconstruction
Optional: Noise SimulationNoisy Acceleration &Angular Velocity of IMU
Figure 2: Processing Pipeline presented in section 2. Black font: processing steps, green font: respective output.The simulation study is presented in section 2.1 (shaded in grey). Sections 2.2 to 2.6 (shaded in green) describethe proposed data processing.estimated from the first two projection images (2.4). The computed poses are then used for three differentmotion correction approaches compared in this article. First, rigid motion matrices are computed from theIMU poses and used to adapt the projection matrices for reconstruction (2.5). Second, the projection imagesare non-rigidly deformed before 3D reconstruction (2.6.2). Third, the sensor poses are incorporated in thereconstruction algorithm for a 3D non-rigid deformation (2.6.3).
In order to create realistic simulations, real motion of standing persons is acquired. Seven healthy subjects arerecorded in three settings of 20 seconds duration: holding a squat of 30 degrees and 60 degrees knee flexion,and actively performing squats. Seven reflective markers are attached to each subject’s sacrum, to the right andleft anterior superior iliac spine, to the right and left lateral epicondyle of the knee, and to the right and leftmalleolus lateralis. The marker positions are tracked with a 3D optical motion tracking system (Vicon, Oxford,UK) at a sampling rate of 120 Hz.Subsequently, in the software OpenSim [12], the marker positions of the active squatting scan of each subjectare used to scale a biomechanical model of the human lower body [14] to the subject’s anthropometry. Themodel with attached virtual markers shown in pink is displayed in Fig. 1b.The scaled model is then animated two times per subject by computing the inverse kinematics using themarker positions of the 30 degrees and 60 degrees squatting scans [27]. The inverse kinematics computationresults in the generalized coordinates that best represent the measured motion. These generalized coordinatesdescribe the complete model motion as the global position and orientation of the pelvis and the angles of all legjoints. Before further processing, jumps in the data that occur due to noise are removed, and the signals arefiltered with a second order Butterworth filter with a cutoff frequency of 6 Hz in order to remove system noise.3ince the model is scaled to the subject’s anthropometry, the generalized coordinates can be used to computethe trajectories of hip, knee and ankle joints. These joint trajectories are the input to all further steps of thedata simulation.
We generate a virtual motion-corrupted CT scan using the moving 4D extended cardiac-torso (XCAT) phantom[26]. The legs of the numerical phantom consist of the bones tibia, fibula, femur and patella including bonemarrow and surrounded by body soft tissue. All structures contained in the phantom have material-specificproperties. Their shapes are defined by 3D control points spanning non-uniform rational B-splines. By changingthe positions of these control points the structures of the XCAT phantom can be non-rigidly deformed.In the default XCAT phantom the legs are extended. To simulate a standing C-arm CT scan, the phantomneeds to take on the squatting pose of the recorded subjects that is varying over time. For this purpose, thepositions of the XCAT spline control points are changed based on the hip, knee, and ankle positions of thebiomechanical model. The deformation process is described in detail by Choi et al. [8].Then, a horizontal circular CT scan of the knees is simulated. As in a real setting, 248 projections aregenerated with an angular increment of 0.8 degrees between projections, corresponding to a sampling rate of31 Hz. The virtual C-arm rotates on a trajectory with 1198 mm source detector distance and 780 mm sourceisocenter distance. The detector has a size of 620 ×
480 pixels with an isotropic pixel resolution of 0.616 mm. Ina natural standing position, the knees are too far apart to both fit on the detector, therefore the rotation centeris placed in the center of the left leg of the deformed XCAT phantom. Then, forward projections are created asdescribed in Maier et al. [21]. Since the subject of this study is to analyze the motion compensation capabilityof IMUs, CBCT artifacts other than motion are not included in the simulation.
The trajectories of hip, knees and ankles computed using the biomechanical model are used to simulate themeasurements of IMUs placed on the leg of the model. IMUs are commonly used for motion analysis in sportsand movement disorders [16]. They are low cost, small and lightweight devices that measure their accelerationand angular velocity on three perpendicular axes. Besides the motion signal, the accelerometer measures theearth’s gravitational field distributed on its three axes depending on its orientation. We virtually place twosuch sensors on the shin, 14 cm below the left knee joint and on the thigh, 25 cm below the hip joint alignedwith the respective body segment (Fig. 1b). In a future real application, sensors in these positions are visiblein the projections as needed for initialization (2.4). At the same time, they are situated at a sufficient distancefrom the knee joint in the direction of the CBCT rotation axis such that their metal components do not causeartifacts in the region of interest.The simulated acceleration a ( t ) and angular velocity ω ( t ) at time point t are computed as described by[6, 11]: a ( t ) = R ( t ) (cid:62) (¨ r Seg ( t ) + ¨ R ( t ) p Sen ( t ) − g ) , (1)[ ω ( t )] × = R ( t ) (cid:62) ˙ R ( t ) = − ω z ( t ) ω y ( t ) ω z ( t ) 0 − ω x ( t ) − ω y ( t ) ω x ( t ) 0 , (2) ω ( t ) = ( ω x ( t ) , ω y ( t ) , ω z ( t )) (cid:62) . (3)All parameters required in these equations are obtained by performing forward kinematics of the biomechanicalmodel. The 3 × R ( t ) describes the orientation of the sensor at time point t in the globalcoordinate system, ˙ R ( t ) and ¨ R ( t ) are its first and second order derivatives with respect to time. The positionof the segment the sensor is mounted on in the global coordinate system at time point t is described by r Seg ( t ),with ¨ r Seg ( t ) being its second order derivative. p Sen ( t ) is the position of the sensor in the local coordinate systemof the segment the sensor was mounted on. Parameter g = (0 , − . , (cid:62) is the global gravity vector. The IMU signal computation assumes a perfect IMU that can measure without the influence of errors. However,in a real application errors can have a significant influence preventing an effective motion compensation. For4xample, Kok et al. [17] showed that the integration of a stationary IMU over ten seconds leads to errors in theorientation and position estimates of multiple degrees and meters, respectively.The most prominent error sources in IMUs leading to these high deviations are random measurement noiseand an almost constant bias [31]. In this study, we focus on the analysis of the unpredictable sensor noise.Commercially available consumer IMU devices have noise densities that are acceptable for larger motion analysis.An example of a commercially available sensor BMI160, Bosch Sensortec GmbH, Reutlingen, Germany, has anoutput noise density of 180 µg/s and 0 . ◦ /s and a root mean square (RMS) noise of 1 . mg/s and 0 . ◦ /s at200 Hz [7]. However, our data shows that the signals produced by a standing swaying motion have amplitudesin the range of 0 . mg/s resp. 0 . ◦ /s. This means that when measuring with an off-the-shelf sensor, the signalwould be completely masked by noise. For this reason, we investigate the noise level improvement necessary touse IMUs for the task of motion compensation in weight-bearing CT imaging.We simulate white Gaussian noise signals of different RMS levels and add them onto the simulated accel-eration a ( t ) and angular velocity ω ( t ). Starting with the RMS values of the aforementioned Bosch sensor, thenoise level is divided by factors of ten down to a factor of 10 . The accelerometer and gyroscope noise levels aredecreased independently. In the following, we will use the notation f a and f g for the exponent, i.e. the factorthe RMS value is divided by is 10 f a resp. 10 f g . The noisy IMU signals are then used to compute rigid motionmatrices for motion compensation as explained in Sections 2.3 and 2.5.Note that the noise influence is evaluated independently of the IMU-based motion compensation methods.All motion compensation methods presented in the following sections are first evaluated on the noise-free signals.Afterwards, we perform rigid motion compensation with noisy IMU signals to investigate the influence of noiseon the applicability of IMUs for motion compensation. The following descriptions are based on Maier et al. [22] and are required for all IMU-based motion compensationapproaches presented in this article.The IMU measures motion in its local coordinate system, however, motion in the global coordinate systemof the CBCT scan is required for motion compensation. The orientation and position of the IMU S ( t ) in theglobal coordinate system at each frame t is described by the affine matrix S ( t ) = (cid:18) R ( t ) r ( t ) (cid:62) (cid:19) , (4)where R ( t ) is a 3 × r ( t ) is a 3 × is the 3 × ∆ g ( t ): S ( t + 1) = ∆ g ( t ) S ( t ) . (5)This global change ∆ g ( t ) can be computed by transforming the local change in the IMU coordinate system ∆ l ( t )to the global coordinate system using the current IMU pose: ∆ g ( t ) = S ( t ) ∆ l ( t ) S ( t ) − . (6)Thus, if the initial pose S ( t = 0) is known, the problem is reduced to estimating the local pose change ∆ l ( t ) inthe IMU coordinate system, which is described in the following paragraphs.The gyroscope measures the orientation change over time ω ( t ) on the three axes of the IMU’s local coordinatesystem which can be directly used to rotate the IMU from frame to frame. The measured acceleration a ( t ),however, needs to be processed to obtain the position change over time. First, the gravity measured on theIMU’s three axes is removed based on its global orientation. For this purpose, the angular velocity ω ( t ) isrewritten to 3 × G ( t ) and used to update the global orientation of the sensor R ( t ). Thisorientation can then be used to obtain the gravity vector g ( t ) in the local coordinate system for each frame t : R ( t + 1) = R ( t ) G ( t ) , (7) g ( t ) = R ( t ) (cid:62) g . (8)The sensor’s local velocity v ( t ), i.e. its position change over time, is then computed as the integral of thegravity-free acceleration considering the sensor’s orientation changes: v ( t + 1) = G ( t ) (cid:62) ( a ( t ) + g ( t ) + v ( t )) . (9)5 r zySource DetectorIMU (a) S (0) S (1) S (2) S (n-1) S (n) S '(n) t t ' n e (b) Figure 3: (a) Disproportionate visualization of the initialization concept. The green box represents the sensorwith its coordinate system plotted inside. The X-rays (blue) pass through the metal components and hit thedetector (gray). (b) Visualization of the velocity initialization approach. Computing the pose S (cid:48) ( t = n ) withincorrect initial velocity v ( t = 0) leads to a wrong translation t (cid:48) which is used for velocity initialization.With these computations, the desired local pose change of the IMU ∆ l ( t ) for each frame t is expressed asan affine matrix containing the local rotation change and position change: ∆ l ( t ) = (cid:18) G ( t ) v ( t ) (cid:62) (cid:19) . (10)Note that the initial pose S ( t = 0) and velocity v ( t = 0) need to be known or estimated in order to apply thistransformation process. In our previously published work, the initial pose and velocity of the IMU necessary for pose estimation in (5)and (9) were assumed to be known, which is not the case in a real setting [22]. Thies et al. [29] proposedto estimate the initial pose as an average sensor pose computed from the complete set of projection images.However, using the average position over the multi-second scan including subject motion leads to inaccuratemotion compensation results. For this reason, we present an initial pose estimation based only on the firstprojection image. By incorporating also the second projection image, the initial velocity can be estimated.
The pose of the IMU S ( t ) at frame t contains the 3D position of the origin r ( t ) and the three perpendicular axesof measurement u x ( t ), u y ( t ) and u z ( t ) in the rotation matrix R ( t ). This coordinate system can be computedfrom any four non-coplanar points within the IMU if their geometrical relation to the sensor’s measurementcoordinate system is known. For simplicity, in this simulation study we assume that these four points arethe IMU’s origin r and the points x , y and z at the tips of the three axes’ unit vectors u x , u y , and u z .We also assume that the sensor has small, highly attenuating metal components at these four points makingtheir projected 2D positions easy to track on an X-ray projection image. Since the C-arm system geometry iscalibrated prior to performing CT acquisitions, the 3D position of each 2D detector pixel, and the 3D sourceposition for each projection are also known. Then the searched points r ( t ), x ( t ), y ( t ), and z ( t ) are positionedon the straight line between the source and the respective projected point (Fig. 3a).In the considered case, the four points need to fulfill the following properties: • The vectors u x ( t ), u y ( t ) and u z ( t ) spanned by r ( t ), x ( t ), y ( t ) and z ( t ) must have unit length. • The euclidean distance between two of the points x ( t ), y ( t ) and z ( t ) must be √ • The inner product of two of the vectors u x ( t ), u y ( t ) and u z ( t ) must be zero. • The right-handed cross product of two of the vectors u x ( t ), u y ( t ) and u z ( t ) must result in the third vector.6olving the resulting non-linear system of equations defined by these constraints for the first projection at timepoint t = 0 yields the 3D positions of r ( t = 0), x ( t = 0), y ( t = 0), and z ( t = 0) and thereby the initial sensorpose S ( t = 0). The initial IMU velocity v ( t = 0) is needed to compute the velocity update in (9). In the following paragraphs,we describe a process to estimate the initial velocity, which is illustrated in Fig. 3b.The IMU acquires data with a higher sampling rate than the C-arm acquires projection images (120 Hz and31 Hz, respectively). If the two systems are synchronized the correspondence between the sampling time points t of the IMU and the projection image acquisition points i is known. The first projection image correspondsto the first IMU sample at time point t = i = 0 and is used to estimate the initial pose S ( t = 0). The secondprojection image at i = 1 corresponds to the IMU sampling point t = n with n > S ( t = n ).Since each IMU pose can be computed from the previous one by applying the pose change between frameswith (5) and (6), the IMU pose at frame t = n can also be expressed as S ( t = n ) = S ( t = 0) ∆ l ( t = 0) ∆ l ( t = 1) . . . ∆ l ( t = n − , (11)which can be rearranged to S ( t = 0) − S ( t = n ) = ∆ l ( t = 0) ∆ l ( t = 1) . . . ∆ l ( t = n − . (12)However, since v ( t = 0) is not known, also ∆ l ( t = 0) and all subsequent local change matrices are not known.Therefore, instead of the actual v ( t = 0), we use the zero-vector as initial velocity introducing an error vector e : v (cid:48) ( t = 0) = = v ( t = 0) + e . (13)This error is propagated and accumulated in the frame by frame velocity computation in (9) and for t > = 1the resulting error-prone velocity is v (cid:48) ( t ) = v ( t ) + G ( t − (cid:62) G ( t − (cid:62) . . . G (0) (cid:62) e . (14)These error-prone velocities v (cid:48) ( t ) lead to incorrect pose change matrices ∆ (cid:48) l ( t ) and thereby to an incorrectcomputation of S (cid:48) ( t = n ): ∆ (cid:48) l ( t ) = (cid:18) G ( t ) v (cid:48) ( t ) (cid:62) (cid:19) , (15) S ( t = 0) − S (cid:48) ( t = n ) = ∆ (cid:48) l ( t = 0) ∆ (cid:48) l ( t = 1) . . . ∆ (cid:48) l ( t = n − . (16)Inserting (14) and expanding (16) shows that the incorrect initial velocity only has an effect on the translationof the resulting affine matrix: S ( t = 0) − S (cid:48) ( t = n ) = S ( t = 0) − S ( t = n ) + (cid:18) , n e0 (cid:62) (cid:19) . (17)In this equation, , denotes a 3 × S ( t = 0) − S (cid:48) ( t = n ) is denotedas t (cid:48) and the translation of S ( t = 0) − S ( t = n ) is denoted as t , this leads to: t (cid:48) = t + n e . (18)The correct initial velocity v ( t = 0) is computed as v ( t = 0) = − e = − n · ( t (cid:48) − t ) . (19)7 .5 Rigid projection matrix correction Under the assumption that the legs move rigidly during the CT scan, it is sufficient to use the measurements ofonly one sensor placed e.g. on the shin for motion estimation. The pose change matrices estimated in (6) and (10)can then be directly applied for motion correction. Note that the angular velocity and velocity are resampled tothe CT acquisition frequency before pose change computation using the synchronized correspondences betweenC-arm and IMU.An affine motion matrix M ( i ) containing the rotation and translation is computed for each projection i .The motion matrix for the first projection i = 0 is defined as the identity matrix M ( i = 0) = I , i.e. the poseat the first projection is used as the reference pose. Each subsequent matrix is then obtained using the globalpose change matrix computed from the sensor measurements: M ( i + 1) = M ( i ) ∆ g ( i ) . (20)In order to correct for the motion during the CBCT scan, we then modify the projection matrices P ( i ) ofthe calibrated CT scan with the motion matrices M ( i ) resulting in motion corrected projection matrices ˆ P ( i ):ˆ P ( i ) = P ( i ) M ( i ) . (21)The corrected projection matrices are then used for the volume reconstruction as described in Section 3.1. Contrary to the assumption in Section 2.5, the leg motion during the scan is non-rigid since the subjects arenot able to hold exactly the same squatting angle for the duration of the scan. As a consequence, the motioncan not entirely be described by a rigid transformation. To address this issue, we propose a non-rigid motioncorrection using both IMUs placed on the model. Using the formulas presented in 2.3, we can compute theposes t S ( t ) and f S ( t ) of the IMUs on tibia and femur, respectively. Since the placement of the IMUs on thesegments relative to the joints is known, the IMU poses can be used to describe the positions of ankle, knee andhip joint, a ( t ), k ( t ) and h ( t ), at each time point t .These estimated joint positions are used to non-rigidly correct for motion during the scans. We propose twoapproaches that make use of Moving Least Squares (MLS) deformations in order to correct for motion [25, 33].The first approach applies a 2D deformation to each projection image, and the second approach performs a 3Ddynamic reconstruction where the deformation is integrated into the volume reconstruction. The idea of MLS deformation is that the deformation of a scene is defined by a set of m control points. Theoriginal positions of the control points are denoted as p j , and their deformed positions are q j with j = 1 , ..., m .For each pixel ν in the image or volume, the goal is to find its position in the deformed image or volume dependingon these control points. For this purpose, the affine transformation f ( ν ) that minimizes the weighted distancebetween the known and estimated deformed positions should be found: (cid:88) j ω j | f ( p j ) − q j | . (22)This optimization is performed for each pixel individually, since the weights ω j depend on the distance of thepixel ν to the control points p j : ω j = 1 | p j − ν | . (23)The weighted centroids p ∗ and q ∗ and the shifted control points ˆp j = p j − p ∗ and ˆq j = q j − q ∗ are used inorder to find the optimal solution of (22) in both the 2D and 3D case: p ∗ = (cid:80) j ω j p j (cid:80) j ω j , (24) q ∗ = (cid:80) j ω j q j (cid:80) j ω j . (25)8ccording to Sch¨afer et al. [25], in the 2D image deformation case, the transformation minimizing (22) isdescribed by: f ( ν ) = | ν − p ∗ | (cid:80) j ˆq j A j | (cid:80) j ˆq j A j | + q ∗ , (26)where A j = ω j (cid:18) ˆp j − ˆp ⊥ j (cid:19) (cid:18) ν − p ∗ − ( ν − p ∗ ) ⊥ (cid:19) . (27)Finding the transformation that minimizes (22) in the 3D case requires the computation of a singular valuedecomposition, as explained by Zhu et al. [33]: (cid:88) j ω j ˆ p j ˆ q Tj = UΣV T . (28)The optimal transformation is then described by: f ( ν ) = VU T ( ν − p ∗ ) + q ∗ . (29) In our first proposed non-rigid approach, we deform the content of the 2D projection images in order to correctfor motion. The initial pose of the subject is used as reference pose, so the first projection image i = 0 is leftunaltered. Each following projection image acquired at time point i is transformed by MLS deformation usingthe estimated hip, knee and ankle joint positions h ( i ), k ( i ) and a ( i ) by using them as control points for theMLS deformation as described in the following paragraph.To obtain the 2D points needed for a 2D projection image deformation, the 3D positions h ( i ), k ( i ) and a ( i )are forward projected onto the detector using the system geometry. However, since the detector is too small tocover the whole leg of a subject, the projected positions of the hip and ankle would be outside of the detectorarea. For this reason, 3D points situated closer to the knee on the straight line between hip and knee, and onthe straight line between ankle and knee are computed with α = 0 . h (cid:48) ( i ) = (1 − α ) h ( i ) + α k ( i ) , (30) a (cid:48) ( i ) = (1 − α ) a ( i ) + α k ( i ) . (31)Then, for each projection i , the initial 3D reference positions a (cid:48) ( i = 0), k ( i = 0) and h (cid:48) ( i = 0) are forwardprojected onto the detector resulting in the 2D control points p j ( i ) with j = 1 , ,
3. The 3D positions h (cid:48) ( i ), k ( i ) and a (cid:48) ( i ) at time of projection acquisition i are forward projected to obtain q j ( i ) with j = 1 , ,
3. Eachprojection image is then deformed by computing the transformation f ( ν ) according to (26) for each imagepixel using these control points. Finally, the motion corrected 3D volume is reconstructed from the resultingdeformed projection images as described in Section 3.1. The second proposed non-rigid approach applies 3D deformations during volume reconstruction. In the typicalback-projection process of CT reconstruction, the 3D position of each voxel of the output volume is forwardprojected onto the detector for each projection image i , and the value at the projected position is added to the3D voxel value. For the proposed 3D dynamic reconstruction, this process is altered: before forward projectingthe 3D voxel position onto the detector for readout, it is transformed using 3D MLS deformation. However, thereadout value is added at the original voxel position.For MLS deformation during reconstruction, the estimated positions of hip, knee and ankle joint h ( i ), k ( i )and a ( i ) are used. The reference pose is again the first pose at i = 0 and the 3D positions of hip, knee andankle h ( i = 0), k ( i = 0) and a ( i = 0) are used as control points p j with j = 1 , ,
3. The 3D positions h ( i ), k ( i ) and a ( i ) are used as q j ( i ) with j = 1 , ,
3. Note that the 3D positions p j are the same for each projection,contrary to the 2D approach where they depend on forward projection using the system geometry. Using thesecontrol points, during reconstruction the transformation f ( ν ) is computed for each voxel of the output volumeand each projection according to (29) and applied for deformation resulting in a motion-compensated outputvolume. 9 Evaluation
All volumes are reconstructed by GPU accelerated filtered back-projection in the software framework CONRAD[20]. The filtered back-projection pipeline included a cosine weighting, a Parker weighting, a truncation cor-rection and a Shepp Logan ramp filtering. The reconstructed volumes have a size of 512 voxels with isotropicspacing of 0.5 mm. In the case of rigid motion compensation, the motion compensated projection matrices P (cid:48) are used for reconstruction. In the case of 2D non-rigid motion compensation, the deformed projection imagesare reconstructed using the original projection matrices. In the case of 3D non-rigid motion compensation,the original projection matrices and projection images are used, but the back-projection process is adapted asdescribed in Section 2.6.3.For comparison, an uncorrected motion-corrupted volume is reconstructed. Furthermore, a motion-freereference is realized by simulating a CT scan where the initial pose of the model is kept constant throughoutthe scan. The IMU-based motion compensation approaches are compared with a marker-based gold standardapproach [9]. For this purpose, small highly attenuating metal markers placed on the knee joint are added tothe projections and tracked for motion compensation as proposed by Choi et al. [9]. All volumes are scaledfrom 0 to 1 and registered to the motion-free reference reconstruction.The image quality is compared against the motion-free reference by the structural similarity index measure(SSIM) and the root mean squared error (RMSE). The SSIM index ranges from 0 (no similarity) to 1 (identicalimages) and considers differences in luminance, contrast and structure [30]. The metrics are computed on thewhole reconstructed leg and on the lower leg and upper leg separately. The influence of noise on the motion correction is evaluated in two ways:First, the estimated motion is compared by decomposing the motion matrices resulting from the noise-freesignal and from the different levels of noisy signals into three-axial translations and rotations. Each noisy resultis then compared to the noise-free estimate. For comparison, we compute the RMSE between each axis of thenoise-free and the noisy translations and rotations, and then average over the three axes. We only evaluateon one scan of one subject, but average over five independent repetitions of adding random white noise andcomputing motion matrices and RMSE.Secondly, volumes reconstructed from noisy signal motion estimates are analyzed. Based on the RMSE resultsfrom the first part of the analysis, certain noise levels are chosen for rigid motion compensated reconstruction.Rigid motion matrices are computed from the noisy signals as described in Sections 2.3 and 2.5 and usedfor volume reconstruction as described above. For image quality comparison, the SSIM and RMSE are againcomputed.
The proposed initialization method yields the correct initial pose and velocity for all scans and all furthercomputations are based on these estimates. Figure 4 shows axial slices through the tibia and the femur, and asagittal slice of one example reconstruction. All proposed methods are able to compensate for motion equally aswell as the marker-based reference approach, or even slightly better. Differences between the methods can onlybe seen in a detailed overlay of the motion compensated reconstruction with the motion-free reconstruction. InFig. 5, details of the axial slice through the thigh are depicted at the femoral bone and at the skin-air-border.The motion-free reconstruction is shown in red, and the motion compensated reconstructions of the rigid, non-rigid 2D and non-rigid 3D IMU methods are shown in green in the three columns. All pixels that occur inboth overlaid images are depicted in yellow. It is noticeable that the rigid correction method fails to estimatethe exact thigh motion leading to an observable shift as a red or green halo at the bone interface and at theskin border. This is reduced for the non-rigid 2D correction and almost imperceptible for the non-rigid 3Dcorrection.This visual impression is confirmed by the SSIM and RMSE values in Table 1. All proposed methods achieveSSIM and RMSE values that are similar or better than those of the reference marker-based method. Compared10 a) No motion (b) Uncorrected (c) Marker-based (d) Rigid (e) 2D non-rigid (f) 3D non-rigid
Figure 4: Exemplary slices of a reconstructed volume. Rows: axial slice through shin, axial slice through thigh,sagittal slice. (a) Scan without motion, (b) uncorrected case, (c) marker-based reference method, (d) rigid IMUmethod, (e) non-rigid IMU 2D projection deformation, (f) non-rigid IMU 3D dynamic reconstruction. Motionartifacts can be reduced by all proposed methods in a similar manner as the marker-based method.Table 1: Average SSIM and RMSE values with standard deviation of the motion compensated reconstructionscompared to the motion-free reconstruction, where all volumes were scaled from 0 to 1. 30 degrees and 60 degreessquats are evaluated separately in the topmost two blocks, while the last block shows the average results overall scans. All measures were computed on the whole reconstructed part of the leg, and on the shank resp. thighonly.
SSIM RMSEWhole Leg Shank Thigh Whole Leg Shank Thigh30 degrees squatUncorrected 0.777 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± a) Rigid (b) 2D non-rigid (c) 3D non-rigid Figure 5: Details of an axial slice through the thigh. Rows: femoral bone, skin border. Overlay of motion-freereference (red) and the result of the method (a) rigid IMU, (b) non-rigid 2D IMU, (c) non-rigid 3D IMU (green),overlapping pixels shown in yellow.Table 2: Mean RMSE between the noise-free and noisy estimation of translation [mm] / rotation [ ◦ ]. All valuesare the result of averaging over five independent noise simulations of one scan. The rows and columns showthe exponents f a and f g of the noise factor for the accelerometer resp. gyroscope. RMSE values for both noisyacceleration and angular velocity that are below 1 for translation and rotation are highlighted as bold. f a f g (cid:57) (cid:57) · (cid:57) (cid:57) (cid:57)
883 / 10 (cid:57)
666 / 2 · (cid:57)
940 / 10 (cid:57) (cid:57)
61 / 10 (cid:57)
77 / 2 · (cid:57)
81 / 10 (cid:57) (cid:57) (cid:57) · (cid:57) (cid:57) (cid:57) (cid:57) · (cid:57) (cid:57) (cid:57) (cid:57) · (cid:57) (cid:57) no noise 148 / 1.285 80 / 0.129 12 / 0.014 1 / 10 (cid:57) (cid:57) · (cid:57) with the uncorrected case, this denotes an improvement of 24-35% in the SSIM and 78-85% in the RMSE values,respectively. Higher SSIM scores and lower RMSE values are achieved for the 30 degrees squat scans comparedwith the 60 degrees squat scans. When comparing the three proposed IMU methods, the results show a slightadvantage of the non-rigid 3D approach over the other two IMU-based approaches. The decremental signal noise analysis shows that the RMS noise of current commercially available communityIMUs would prevent a successful IMU motion compensation (Table 2, top left). While the resulting rotationestimate shows an average RMSE to the noise-free estimate of 1.45 ◦ , the value of the estimated translation isconsiderably larger (9461 mm). Deviations above 1 mm and 1 ◦ of the translation and rotation are expected todecrease the reconstruction quality considerably. For noisy acceleration and angular velocity, an average RMSEvalue below these thresholds was only achieved if the RMS noise value was decreased by a factor of 10 or10 . For this reason, and in the further analysis, the estimated motion matrices of these noise levels are usedto perform a motion compensated reconstruction. The resultant reconstructions are shown in Fig. 6. Whilethe image quality for f a = 5 is similar to the motion-free case, streaking artifacts are visible when f a = 4,12 a) No noise (b) f a = 5, f g = 5 (c) f a = 5, f g = 4 (d) f a = 4, f g = 5 (e) f a = 4, f g = 4 Figure 6: Comparison of noise-free and noisy rigid IMU compensation. Rows: axial slice through shin, axialslice through thigh, sagittal slice. (a) Noise-free IMU signal, in row (b)-(e) noise is added to the simulatedacceleration and angular velocity. The RMS noise value is 1 . mg/s resp. 0 . ◦ /s divided by 10 f a resp. 10 f g .independent of f g = 4 or f g = 5. The quantitative analysis of the noisy results in Table 3 confirms thisfinding: The average SSIM and RMSE values are only slightly decreased respectively increased compared withthe noise-free estimation if f a = 5, but deteriorate markedly when f a = 4. With the presented simulation study, we have shown the feasibility and limitations of using IMUs for motioncompensated CT reconstruction. While all proposed methods are capable of reducing motion artifacts in thenoise-free case, our noise analysis shows that the applicability in real settings is not yet fully realizable.The presented initialization approach based on the system geometry and the first two projection imagesworks well under the optimal conditions of a simulation. In a real setting, it is unlikely that the IMU willcontain clearly distinguishable metal components at the IMU coordinate system and they are unlikely to beresolved using current flat panel detectors. However, the presented approach can be applied with arbitraryfour IMU points, assuming their relation to the origin and coordinate system is known. The IMU should thenbe positioned such that their projections are well distinguishable in the two projection images required forinitialization.The results of all proposed IMU-based motion compensation methods are qualitatively and quantitativelyequivalent, or even improved, compared with the gold standard marker-based approach that estimates a rigidmotion. For the marker-based approach, individual multiple tiny markers have to be placed successively, andneed to be attached directly to the skin in order to limit soft tissue artifact. For effective marker tracking, itshould be ensured that they don’t overlap in the projections. The metal also produces artifacts in the kneeregion. An advantage of our proposed methods is the need for only one or two IMUs on the leg. Here, the onlyis that the components used for initialization need to be visible in the first projection images. Since the shankand thigh are modeled as stiff segments, the sensors can be placed sufficiently far away from the knee joint inorder not to cause metal artifacts that could hinder subsequent image analyses.It is noticeable that all methods performed slightly better on the scans where subjects were asked to hold13able 3: Average SSIM and RMSE values with standard deviation of the rigid motion compensated recon-structions from noisy signals compared to the motion-free reconstruction over all scans. The noise factor is10 − f a for the simulated accelerometer resp. 10 − f g for the gyroscope. All measures were computed on the wholereconstructed part of the leg, and on the shank resp. thigh only. SSIM RMSEWhole Leg Shank Thigh Whole Leg Shank ThighNo noise 0.991 ± ± ± ± ± ± f a = 5 , f g = 5 0.989 ± ± ± ± ± ± f a = 5 , f g = 4 0.987 ± ± ± ± ± ± f a = 4 , f g = 5 0.926 ± ± ± ± ± ± f a = 4 , f g = 4 0.927 ± ± ± ± ± ± a squat of 30 degrees compared with those for the 60 degrees squat. This is likely a result of it being morechallenging to hold the same pose at a lower squat, where the motion in these cases has a higher range leadingto increased error.With the non-rigid 3D IMU approach, improved results are achieved compared with the rigid IMU approach,especially in the region of the thigh. Although this is only a small improvement, it may have significant impact onfurther image analyses given the sub-millimeter range of the expected cartilage change under load [13]. However,the simple model of three moving joint positions and an affine deformation is considerably less complex than theXCAT spline deformation during projection generation suggesting that further improvements can be achievedby using a more realistic model.The non-rigid 2D IMU approach provides small improvements in visual results compared with the rigidapproach (Fig. 5), but the quantitative evaluation shows similar SSIM and RMSE values. Although the non-rigid motion estimate might be more accurate, at the same time the image deformation introduces small errors,since X-rays measured at a deformed detector position would have also been attenuated by other materials.It is notable that the noise has a larger effect on the processing of the accelerometer signal compared withthat of the gyroscope signal (Table 2). On the one hand, the double integration performed on the accelerationleads to a quadratic error propagation. On the other hand, the noisy gyroscope signals used for gravity removaland velocity integration introduce additional errors that are accumulated during acceleration processing.In our study, we focus only on signal noise as one of the most severe IMU measurement errors. In the future,similar simulations might be performed in order to determine further necessary specifications.The noise level improvements that are required for real application are in the range of 10 for the accelerom-eter and 10 for the gyroscope. Although recently developed accelerometers and gyroscopes achieve these lownoise levels, they are designed to measure signals in the mg-range and are far too delicate for the applicationat hand [10, 23, 32]. If developments continue to progress rapidly, and a robust sensor with low noise level andhigh measurement range is developed, our method could be applied in a real setting. Acknowledgments
This work was supported by the Research Training Group 1773 Heterogeneous Image Systems funded by theGerman Research Foundation (DFG), and by the AIT4Surgery grant funded by the Federal Ministry of Educa-tion and Research (BMBF, grant number 03INT506BA). Bjoern Eskofier gratefully acknowledges the supportof the German Research Foundation (DFG) within the framework of the Heisenberg professorship programmme(grant number ES 434/8-1). The authors acknowledge funding support from NIH 5R01AR065248-03 and NIHShared Instrument Grant No. S10 RR026714 supporting the zeego@StanfordLab.
References [1] Arden, N., Nevitt, M.C.: Osteoarthritis: Epidemiology. Best Pract Res Cl Rh (1), 3 – 25 (2006)[2] Berger, M., M¨uller, K., Aichert, A., Unberath, M., Thies, J., Choi, J.H., Fahrig, R., Maier, A.: Marker-freemotion correction in weight-bearing cone-beam CT of the knee joint. Med Phys (3), 1235–48 (2016)143] Bier, B., Aichert, A., Felsner, L., Unberath, M., Levenston, M., Gold, G., Fahrig, R., Maier, A.: EpipolarConsistency Conditions for Motion Correction in Weight-Bearing Imaging. In: BVM 2017. pp. 209–214(2017)[4] Bier, B., Aschoff, K., Syben, C., Unberath, M., Levenston, M., Gold, G., Fahrig, R., Maier, A.: Detectinganatomical landmarks for motion estimation in weight-bearing imaging of knees. In: Machine Learning forMedical Imaging Reconstruction. pp. 83–90 (2018)[5] Bier, B., Ravikumar, N., Unberath, M., Levenston, M., Gold, G., Fahrig, R., Maier, A.: Range Imagingfor Motion Compensation in C-Arm Cone-Beam CT of Knees under Weight-Bearing Conditions. J Imaging (1), 1–16 (2018)[6] van den Bogert, A.J., Read, L., Nigg, B.M.: A method for inverse dynamic analysis using accelerometry. JBiomech (7), 949 – 954 (1996)[7] Bosch Sensortec: BMI160 - Data sheet (11 2020), , accessed: 2021-01-18[8] Choi, J.H., Fahrig, R., Keil, A., Besier, T.F., Pal, S., McWalter, E.J., Beaupr´e, G.S., Maier, A.: Fiducialmarker-based correction for involuntary motion in weight-bearing C-arm CT scanning of knees. Part I.Numerical model-based optimization. Med Phys (9), 091905–1 – 091905–12 (2013)[9] Choi, J.H., Maier, A., Keil, A., Pal, S., McWalter, E.J., Beaupr´e, G.S., Gold, G.E., Fahrig, R.: Fidu-cial marker-based correction for involuntary motion in weight-bearing C-arm CT scanning of knees. II.Experiment. Med Phys (6), 061902–1 – 061902–16 (2014)[10] Darvishia, A., Najafi, K.: Analysis and design of super-sensitive stacked (s3) resonators for low-noisepitch/roll gyroscopes. In: 2019 IEEE INERTIAL. pp. 1–4 (2019)[11] De Sapio, V.: Advanced Analytical Dynamics: Theory and Applications. Cambridge University Press(2017)[12] Delp, S.L., Anderson, F.C., Arnold, A.S., Loan, P., Habib, A., John, C.T., Guendelman, E., Thelen, D.G.:Opensim: Open-source software to create and analyze dynamic simulations of movement. IEEE TransBiomed Eng (11), 1940–1950 (2007)[13] Glaser, C., Putz, R.: Functional anatomy of articular cartilage under compressive loading quantitativeaspects of global, local and zonal reactions of the collagenous network with respect to the surface integrity.Osteoarthritis Cartilage (2), 83 – 99 (2002)[14] Hamner, S.R., Seth, A., Delp, S.L.: Muscle contributions to propulsion and support during running. JBiomech (14), 2709 – 2716 (2010)[15] Jost, G., Walti, J., Mariani, L., Cattin, P.: A novel approach to navigated implantation of s-2 alar iliacscrews using inertial measurement units. J Neurosurg: Spine (3), 447–453 (2016)[16] Kautz, T., Groh, B.H., Hannink, J., Jensen, U., Strubberg, H., Eskofier, B.M.: Activity recognition inbeach volleyball using a deep convolutional neural network. Data Min Knowl Discov (6), 1678–1705(2017)[17] Kok, M., Hol, J.D., Sch¨on, T.B.: Using Inertial Sensors for Position and Orientation Estimation. Founda-tions and Trends ® in Signal Processing (1-2), 1–153 (2017)[18] Lemammer, I., Michel, O., Ayasso, H., Zozor, S., Bernard, G.: Online mobile c-arm calibration usinginertial sensors: a preliminary study in order to achieve cbct. Int J Comput Assist Radiol Surg , 213–224(2019)[19] Maier, A., Choi, J.H., Keil, A., Niebler, C., Sarmiento, M., Fieselmann, A., Gold, G., Delp, S., Fahrig, R.:Analysis of Vertical and Horizontal Circular C-Arm Trajectories. In: Proc SPIE. vol. 7961, pp. 796123–1–8(2011) 1520] Maier, A., Hofmann, H., Berger, M., Fischer, P., Schwemmer, C., Wu, H., M¨uller, K., Hornegger, J., Choi,J.H., Riess, C., Keil, A., Fahrig, R.: CONRAD - A software framework for cone-beam imaging in radiology.Med Phys (11), 111914 (2013)[21] Maier, A., Hofmann, H., Schwemmer, C., Hornegger, J., Keil, A., Fahrig, R.: Fast Simulation of X-rayProjections of Spline-based Surfaces using an Append Buffer. Phys Med Biol (19), 6193–6210 (2012)[22] Maier, J., Nitschke, M., Choi, J.H., Gold, G., Fahrig, R., Eskofier, B.M., Maier, A.: Inertial measurementsfor motion compensation in weight-bearing cone-beam ct of the knee. In: MICCAI. pp. 14–23 (2020)[23] Masu, K., Machida, K., Yamane, D., Ito, H., Ishihara, N., Chang, T.F.M., Sone, M., Shigeyama, R., Ogata,T., Miyake, Y.: Cmos-mems based microgravity sensor and its application. ECS Trans (5), 91 (2020)[24] Powers, C.M., Ward, S.R., Fredericson, M., Guillet, M., Shellock, F.G.: Patellofemoral Kinematics DuringWeight-Bearing and Non-Weight-Bearing Knee Extension in Persons With Lateral Subluxation of thePatella: A Preliminary Study. J Orthop Sports Phys Ther (11), 677–685 (2003)[25] Schaefer, S., McPhail, T., Warren, J.: Image deformation using moving least squares. ACM Trans Graph (3), 533–540 (2006)[26] Segars, W.P., Sturgeon, G., Mendonca, S., Grimes, J., Tsui, B.M.W.: 4d xcat phantom for multimodalityimaging research. Med Phys (9), 4902–4915 (2010)[27] Seth, A., Hicks, J.L., Uchida, T.K., Habib, A., Dembia, C.L., Dunne, J.J., Ong, C.F., DeMers, M.S.,Rajagopal, A., Millard, M., Hamner, S.R., Arnold, E.M., Yong, J.R., Lakshmikanth, S.K., Sherman, M.A.,Ku, J.P., Delp, S.L.: Opensim: Simulating musculoskeletal dynamics and neuromuscular control to studyhuman and animal movement. PLOS Comput Biol (7), 1–20 (2018)[28] Sisniega, A., Stayman, J.W., Yorkston, J., Siewerdsen, J.H., Zbijewski, W.: Motion compensation inextremity cone-beam CT using a penalized image sharpness criterion. Phys Med Biol (9), 3712–3734(2017)[29] Thies, M., Maier, J., Eskofier, B., Maier, A., Levenston, M., Gold, G., Fahrig, R.: Automatic orientationestimation of inertial sensors in c-arm ct projections. Curr Dir Biomed Eng (1), 195–198 (2019)[30] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility tostructural similarity. IEEE Trans Image Process (4), 600–612 (2004)[31] Woodman, O.J.: An introduction to inertial navigation. Tech. Rep. UCAM-CL-TR-696, University ofCambridge, Computer Laboratory (2007)[32] Yamane, D., Konishi, T., Safu, T., Toshiyoshi, H., Sone, M., Machida, K., Ito, H., Masu, K.: A memsaccelerometer for sub-mg sensing. Sensor Mater31