[PDF] A Distance Oriented Kalman Filter Particle Swarm Optimizer Applied to Multi-Modality Image Registration

Abstract

In this paper we describe improvements to the particle swarm optimizer (PSO) made by inclusion of an unscented Kalman filter to guide particle motion. We demonstrate the effectiveness of the unscented Kalman filter PSO by comparing it with the original PSO algorithm and its variants designed to improve performance. The PSOs were tested firstly on a number of common synthetic benchmarking functions, and secondly applied to a practical three-dimensional image registration problem. The proposed methods displayed better performances for 4 out of 8 benchmark functions, and reduced the target registration errors by at least 2mm when registering down-sampled benchmark brain images. Our methods also demonstrated an ability to align images featuring motion related artefacts which all other methods failed to register. These new PSO methods provide a novel, efficient mechanism to integrate prior knowledge into each iteration of the optimization process, which can enhance the accuracy and speed of convergence in the application of medical image registration.

Full PDF

AAbstract — In this paper we describe improvements to the particle swarm optimizer (PSO) made by inclusion of an unscented Kalman filter to guide particle motion. We demonstrate the effectiveness of the unscented Kalman filter PSO by comparing it with the original PSO algorithm and its variants designed to improve performance. The PSOs were tested firstly on a number of common synthetic benchmarking functions, and secondly applied to a practical three-dimensional image registration problem. The proposed methods displayed better performances for 4 out of 8 benchmark functions, and reduced the target registration errors by at least 2mm when registering down-sampled benchmark brain images. Our methods also demonstrated an ability to align images featuring motion related artefacts which all other methods failed to register. These new PSO methods provide a novel, efficient mechanism to integrate prior knowledge into each iteration of the optimization process, which can enhance the accuracy and speed of convergence in the application of medical image registration.

Index Terms — global optimization, particle swarm, unscented Kalman filter, image registration I. I NTRODUCTION

Optimization is a key component in many practical scientific computing problems. It is used to search for the optimum value of a pre-defined fitness function of a measure within a problem space [1]. As a typical global optimization method, particle swarm optimization (PSO) has been paid significant attention during last few decades, as it is less prone to becoming trapped in local optima. Various improvements have been suggested to the original PSO algorithm to improve convergence and computation speed. However, neither the original PSO method nor its existing modifications derived any advantage from available prior knowledge about the problem space which may act as a critical role in specific applications. The goal of many optimization problems is not just searching for an optimal value of the fitness function. One typical example of this issue is presented by problem associated with image registration, for which the distance to the real global optima, rather than the value of the measurement function, is more important. This is because small fitness differences but large distances in the problem space actually represent large differences between image transformation parameters, which may in turn falsely indicate alignment between images. If prior knowledge about the content of the image is ignored in favour of the result of the value-oriented PSO, the optimization process may tend to converge to local optima that exhibit “ better ” measurement values. These local optima may be at a significant distance from the global optimum, thereby causing the image registration to “fail”. To deal with this special type of application, in this paper, we introduce a novel distance-oriented PSO, guided by an unscented Kalman filter (UKF) [1]. This method can encode prior knowledge about the distribution of a fitness function within the problem space, and tends to stretch the optimizer to converge at a point near the true global optimum. Image registration algorithms are often based on the premise that the magnitude of the chosen similarity metric is related to the magnitude of the error between the current spatial transform and the optimal spatial transform between the images [1, 2]. Assuming the distribution of the similarity metric function is approximately unimodal, we propose a customized UKF-PSO framework derived from the Bayesian perspective of the PSO [3]. The UKF-PSO algorithm iteratively estimates global optima with accumulated information about probability distributions of the similarity measurements. This leads to faster convergence, with improved robustness to local optima over a large search space. Another advantage of this approach is the ease with which multiple similarity metrics can be combined, by extension to a nested UKF-PSO (N-UKF-PSO) that removes the need to apply fixed weights to the different similarity metrics by adaptively adjusting the weighting during the convergence process of the Kalman filter. The proposed methods are compared to several popular PSO methods using some popular benchmark functions, as well as a publicly available medical image registration dataset. Both the UKF-

Distance Oriented Kalman Filter Particle Swarm Optimizer Chengjia Wang, Keith A. Goatman, James Boardman, Erin Beveridge, David Newby, and Scott Semple BHF Centre for Cardiovascular Science, University of Edinburgh, Edinburgh, Toshiba Medical Visualization Systems, Edinburgh, MRC Centre for Reproductive Health, University of Edinburgh, Edinburgh.

Fig. 1. Fitting a Gaussian function to the distribution of mutual information within three different searching ranges. The Gaussian function tends to give a more accurate estimation of the distribution within a smaller searching range.

SO and N-UKF-PSO display better robustness to local optima and better accuracies in the image registration experiments. In this paper, important previous work that attempts to solve similar image registration problems using the original or modified versions of PSO are briefly reviewed in section II. The theory of our UKF-PSO and N-UKFPSO methods are introduced in section III. Sections IV and V describe the details of UKF-PSO and N-UKF-PSO. Experiments performed on both benchmark functions and a publicly available image registration dataset are shown in sections VI and VII, and discussed in section VIII. II. R ELATED W ORKS

Both local and global optimization methods have been applied to solve image registration problems. Local optimization suffers from becoming trapped in local optima. The use of multi-resolution image pyramids can partially mitigate this, however, the global optimum may not be represented in the down-sampled problem spaces, in which case the optimizer will still converge to a local optima [4]. Among the global optimization methods, evolutionary computation plays an important role. For example, inspired by social and cooperative behavior, Kennedy and Eberhart [5] proposed the first PSO algorithm in the mid-1990s [6]. Since then a number of modified versions of PSO have been developed and applied to different image registration applications [4, 7]. Research efforts have concentrated on improving the convergence speed and robustness of the PSO when the problem spaces are very large and exhibit multiple local optima. These extensions of PSO methods use either alternative neighbourhood structures [8] or novel particle evolution strategies [6, 7, 9]. A widely used PSO using alternative particle evolution formulae is quantum behaved PSO (QPSO) [9]. The formulae were further redesigned in the revised QPSO (RQPSO), the diversity controlled RQPSO (DRQPSO) [10] and the chaotic search QPSO [11]. Another popular approach is to hybridize PSO with other optimization methods, for example Genetic algorithm [12] or Simplex [13]. Comparisons and reviews of the major PSO variants can be found in [3].

Wachowiak’s method provides a registration-specific prior knowledge approach [4], but requires precise initialization. Other methods that exploit prior knowledge include the Bare Bones PSO [14], Kalman Filter PSO [15], and Andras' Gaussian PSO, based on a Bayesian interpretation [3]. These methods either provide a probabilistic perspective of the particle status, or a mechanism to integrate prior knowledge. III. T HEORY D ERIVATION

For a fitness function 𝑓(𝒙) , an optimization process search in a problem space Ω for a 𝒙 that gives optimal value of 𝑓(𝒙) . For the problem targeted by this paper, image registration, 𝑓(𝒙) is a predefined similarity measure between images, and Ω is all the possible image transformations limited by degrees of freedom. The purpose of optimization is then formulated by: 𝒙 𝒐 = 𝑎𝑟𝑔max 𝒙𝜖Ω 𝑓(𝒙), (1) w here 𝒙 𝒐 is the optimal solution of 𝒙 , and the purpose of registration is to find 𝒙 𝒐 which gives the optimal image transformation parameters, or leads to the highest similarity of the images. However, due to the presence of local optima, 𝒙 𝒐 is often difficult to find. In this case, the returned 𝒙 should be as close as possible to 𝒙 𝒐 . The PSO simulates the social and cooperative behavior of a “ swarm ” of potential solutions, called particles [6]. Each potential solution corresponds to one position in problem space. Each particle explores the problem space at an individual random speed that is partially affected by combined knowledge about the up-to-date global and local optima. Searching for global optima in a D -dimension problem space with K particles at the 𝑡 th iteration of PSO, a solution represented by the position of the 𝑖 th particle is a D -element vector, 𝐱 𝑖 (𝑡) ={𝑥 𝑖1 (𝑡), x 𝑖2 (𝑡), ⋯ , x 𝑖𝐷 (𝑡)}, 𝑖 ∈ {1, 2, ⋯ , 𝐾} . In the original PSO method, a widely used formula for updating the speeds of the particles, 𝒗 𝑖 (𝑡 + 1) , is given by [3, 5, 6]: 𝒗 𝑖 (𝑡 + 1) = 𝜔𝒗 𝑖 (𝑡) + 𝑐 𝑝 𝑟 𝑝 (𝒙 𝑖𝑝 − 𝒙 𝑖 (𝑡))+ 𝑐 𝑔 𝑟 𝑔 (𝒙 𝑔 − 𝒙 𝑖 (𝑡)) (2) Fig. 2. Information available at the t th iteration of PSO: the hidden state 𝜃 represents an optimal estimation 𝒙 𝑔∗ of the true global optimum; the observed state 𝜉 is defined as the average position of all particles 𝒙̂ 𝑔 weighted by the measured fitness function 𝑓̂ of each particle. An estimation of the hidden state 𝒙 𝑔 is produced by fitting 𝑓̂ to a Gaussian function in each iteration of the optimization process. For the t th iteration, 𝒙 𝑔 (𝑡) can be obtained by combining 𝒙 𝑔 (𝑡 − 1) and 𝒙̂ 𝑔 (𝑡) . When solving the optimization problem using a linear Kalman filter, 𝒙 𝒈 (𝑡 − 1) is treated as the output of 𝑡𝑖𝑚𝑒 − 𝑢𝑝𝑑𝑎𝑡𝑒 stage, 𝜃̂ -, and 𝒙 𝒈 (𝑡) is the output of the 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡 − 𝑢𝑝𝑑𝑎𝑡𝑒 stage, 𝜃̂ . Fig. 3. Hidden Markov model: 𝜃 and 𝜉 are the hidden and observed states. Fig. 4. The non-linear state transition model ℱ used to evolve the optimal estimation 𝒙 𝑔∗ of the true global optimum. here 𝜔 is the inertia weight, 𝒙 𝑖𝑝 is the local best solution found by the i th particle, and 𝒙 𝑔 is the best up-to-date global optimum. 𝑐 𝑝 and 𝑐 𝑔 are acceleration constants that weight the attraction of local and global optima to each particle, and 𝑟 𝑝 and 𝑟 𝑔 are random generated numbers drawn from the uniform distribution over the range of (0,1) [6]. The updated particle positions are then given by [3, 5, 6]: 𝒙 𝑖 (𝑡 + 1) = 𝒙 𝑖 (𝑡) + 𝒗 𝑖 (𝑡 + 1) . (3) Equation (2) consists of three components: the previous velocity 𝒗 𝑖 (𝑡) , the cognition component 𝑐 𝑝 𝑟 𝑝 (𝒙 𝑖𝑝 − 𝒙 𝑖 (𝑡)) , and the social component 𝑐 𝑔 𝑟 𝑔 (𝒙 𝑔 − 𝒙 𝑖 (𝑡)) . The combination of these components is a compound velocity that moves the particles towards the local and global optima, while preventing any significant deviations from the particles' previous directions [6]. This mechanism makes a stepwise improvement in the algorithm convergence until all of the particles have moved into a small constrained area, or the global best position remains unchanged for a certain number of iterations. Other than the coefficients which appear in the PSO formula, the most common modifiable parameters are the swarm size (i.e. the number of particles), the searching range, and the maximum number of iterations. If 𝑓(𝒙) is complicated and presents multiple local optima which is common for image registration applications, PSO still suffers from premature convergence. Integration of prior knowledge of the problem space into the particle evolution formulae can improve the robustness of PSO. Andras [3] proposed a Gaussian PSO model based on a Bayesian interpretation. In this model, the evaluated fitness value, 𝑓̂(𝒙) [3] is given by: 𝑓̂(𝒙) = 𝑓(𝒙) + 𝜖 , (4) where 𝜖 is a noise distribution (typically zero-mean Gaussian) added to the noise-free fitness value [3]. Following Bayesian theory, likelihood is given in the form of a probability density function (PDF), 𝒫(𝒙) , defined over the search range. Given all 𝑓̂(𝒙 𝑖 (0)) , the PDF may be calculated using: where the posterior 𝒫(𝒙|𝑓̂(𝒙 𝑖 (0))) calculated in an iteration is used as the new 𝒫(𝒙) in the following iteration. The evolution of the particles can then be formulated by [3]: 𝒙 𝑖 (𝑡 + 1) = 𝒙 𝑖 (𝑡) + 𝛾 ∙ 𝜕𝜕𝑥 ln 𝒫 𝑡+1 (𝒙)| 𝒙=𝒙 𝑖 (𝑡) , (6) where 𝒫 𝑡 (𝒙) is the 𝒫(𝒙) calculated in the t -th iteration. The calculation of 𝒫 𝑡 (𝒙) can be performed based on the assumption that the evaluated fitness values of the particles are either co-dependent or independent, leading to two implementations of this Bayesian Gaussian PSO. The fitness function is assumed to be proportional to the probability of a point in the search range being the optimal solution. Thus in a registration problem, the similarity measure can be considered as a non-normalized probability. The probability distribution over the whole search range is interpolated using multiple Gaussian bases. Andras [3] provides a framework to integrate prior knowledge into image registration in the form of 𝒫(𝒙) [3]. In this paper, we use a simplified definition of

𝒫(𝒙) , based on prior knowledge specific to image registration. As a result, there is no need to calculate the probability distribution under different assumptions of dependences between particles, as

𝒫(𝒙) can be directly fitted using the evaluation of all particles. Target registration error (TRE) is often the ground truth metric of image registration problems. The TRE is zero for two perfectly aligned images. We generalize this, such that 𝒙 𝑜 is the optimal transformation represented as a point in the problem search space, that results in a TRE closest to zero. Over the whole search range, the similarity measure 𝑓(𝒙 𝑖 ) of a transformation represented by any particle is the distance measure ‖𝒙 𝑖 − 𝒙 𝑜 ‖ . Any other similarity measure can be considered as a monotonic mapping of this distance, 𝑲(‖𝒙 𝑖 −𝒙 𝑜 ‖) . We simply assume a form of Gaussian function for 𝐊 , 𝑓(𝒙 𝑖 ) = 𝑒𝑥𝑝 (− 𝛽2 ‖𝒙 𝑖 − 𝒙 𝑜 ‖ ) . (7) This assumption of prior knowledge indicates that

𝒫(𝒙) follows a Gaussian-like distribution with unknown expectation, 𝒙 𝑜 . The advantage of using this Gaussian form is that 𝒙 𝑜 equals to the expectation ∫ 𝒙𝒫(𝒙) 𝑑𝒙 over the whole problem space. In each iteration of the PSO, 𝒙 𝑜 is estimated by the optimum value 𝒙 𝑔 within the area searched by particles. Here, rather than directly selecting the optimum value from among all particles, the estimated global optimum 𝒙 𝑔 (𝑡) is calculated by the average of all 𝒙 𝑖 (𝑡) weighted by the normalized 𝑓̂(𝒙 𝑖 (𝑡)) defined in equation (4). According to (4), and the theory of the Bayesian interpretation of the PSO [3], 𝒫(𝒙) is thus modeled as:

𝒫(𝒙 𝑖 ) = 𝜎 ∙ (𝑒𝑥𝑝 (− 𝛽2 ‖𝒙 𝑖 − 𝒙 𝑔 ‖ ) + 𝜖) , (8) where 𝜎 is a normalization constant and 𝜖 is a zero-mean Gaussian noise with unknown standard deviation. Ignoring the noise 𝜖 , a reasonable estimation of 𝒫(𝒙) is: 𝒫̂ 𝑡 (𝒙 𝑖 ) = 𝜎 ∙ (𝑒𝑥𝑝 (− ‖𝒙 𝑖 − 𝒙̂ 𝑔 (𝑡)‖ ⁄ )) , (9) where the 𝒫̂ 𝑡 (𝒙 𝑖 ) is the estimation of 𝒫(𝒙) at 𝒙 𝑖 in the t th iteration. 𝒫̂ 𝑡 (𝒙) can be obtained by fitting a Gaussian function using all 𝑓̂(𝒙 𝑖 (𝑡)) . 𝛿̂ is the variance of this Gaussian function. The global optimum can be estimated by solving, 𝜕𝜕𝒙 𝒫̂ 𝑡 (𝒙) = 0 . (10) Although the assumed Gaussian form of 𝑓̂(𝒙) and

𝒫̂(𝒙 𝑖 ) cannot accurately capture the shape of the similarity measure for a large search range, it gives a reasonable estimation of the 𝒫(𝒙|𝑓̂(𝒙 𝑖 (0))) = 𝒫(𝑓̂(𝒙 𝑖 (0))|𝒙) ∙ 𝒫(𝒙)𝒫 (𝑓̂(𝒙 𝑖 (0))) , 𝑖 = 1, ⋯ , 𝐾 (5) lobal optima, and will improve as the search range contracts, as shown in Fig. 1. If the searching algorithm converges ideally, the Gaussian function becomes a Dirac delta function. Equation (10) can be solved by fitting the shape of ln 𝒫̂ 𝑡 (𝒙) using a quadratic least squares method, though this will introduce much greater computational complexity. The purpose of fitting the Gaussian function is to obtain an estimated global optimum 𝒙̂ 𝑔 (𝑡), and 𝛿̂ is not used in further optimization processes. We use the weighted mean of all particles obtained in each iteration to estimate the initial global optimum, i.e. 𝒙̂ 𝑔 (𝑡) = (∑ 𝒙 𝒊 (𝑡)𝑓̂(𝒙 𝒊 (𝑡)) 𝐾𝑖=1 ) (∑ 𝑓̂(𝒙 𝒊 (𝑡)) 𝐾𝑖=1 )⁄ . (11)

The estimation of the global optimum should move towards the true global optimum of the similarity measure as the search range contracts during the optimization process. One important assumption of (10) is that 𝑓̂(𝒙) ≥ 0 , which is easy to achieve by normalization. Specific to image registration problems, if the images are aligned by minimizing a difference measure, denoted as 𝑓̂ 𝑑 (𝒙) , we can convert it to a similarity measure by, 𝑓̂(𝒙) = 𝑒𝑥𝑝 (−𝜀 (𝑓̂ 𝑑 (𝒙))) (12) where 𝜀(∙) is a function of 𝑓̂ 𝑑 (𝒙) in the search range. In summary, during each iteration of the PSO, a noisy estimation of the global optimum 𝒙̂ 𝑔 (𝑡) can be obtained using (11). 𝒙̂ 𝑔 (𝑡) can then be improved during the evolutionary process of the PSO by combining information from all of the particles and all of the previous iterations. IV. T HE LDS-KFPSO M ETHOD 𝒙̂ 𝑔 calculated using (11) can replace 𝒙 𝑔 in the PSO formulae as it moves closer to the optimum of 𝑓̂(𝒙) . However, with integrated prior knowledge, the estimation of the PDF of 𝒙 𝑜 in the search range can be improved by accumulating the information obtained in previous iterations. This can be achieved through the dynamic Bayesian network (DBN) presented in Monson and Seppi's Kalman filter PSO [15], which is used to characterize the time-sensitive relationship between observable and hidden states. For image registration problems using swarm optimization, the global and local optima obtained in each iteration can be encoded as the observed state 𝜉 . Based on the theory in [15], the desired hidden state 𝜃 represents the ideal location and speed of a particle that leads to a better fitness of 𝒙 𝑔∗ . With the prior knowledge discussed above we can define 𝒙 𝑔∗ as the average of 𝒙 𝑖 , weighted by the noise-free fitness function, 𝑓(𝒙 𝒊 ) , or more directly define it as 𝒙 𝑔∗ = 𝒙 𝑜 . An estimation 𝜃̂ of the hidden state is given for each iteration. However, because the prior knowledge of registration problems is integrated and 𝒙̂ 𝑔 is calculated using equation (11), a much simpler DBN can be adopted here, using the raw information demonstrated in Fig. 2. After 𝑡 − 1 iterations, the hidden state is the ideal position 𝒙 𝑔∗ (𝑡) that is closer to 𝒙 𝑜 , or equals 𝒙 𝑜 . The observation 𝜉 can be directly defined as 𝒙̂ 𝑔 (𝑡) . Each iteration has a current estimation of the hidden state 𝒙 𝑔∗ (𝑡) based on this observation. To obtain this estimation, the relationship between 𝜃 and 𝜉 is depicted as an instance of the hidden Markov model (HMM), as shown in Fig. 3 [15]. The hidden state 𝜃 evolves over time, based on a state transition model ℱ , and influences the observable state through a known observation model ℋ . The transition model, ℱ , reflects how an estimated global optimum moves closer to locations of better fitness, and the observation model can then be described as a model of the influence of 𝒙 𝑔∗ (𝑡) upon 𝒙̂ 𝑔 (𝑡) . When defining 𝒙 𝑔∗ as the average of 𝒙 𝑖 weighted by 𝑓(𝒙 𝑖 ) , as shown in Fig. 4, ℱ can be specified such that the evolution of 𝒙 𝑔∗ depends on the movements of every particle. This assumes either a highly non-linear state transition process, or we may use 𝒙 𝑜 as the hidden state that assumes an identical state transition. In both cases, the observation model is an identical mapping. This influence of 𝒙 𝑔∗ on 𝒙̂ 𝑔 is inherently noisy, and the noise is used as a subjective uncertainty model of the accuracy of an observation [15]. Based on the prior knowledge being integrated, the current state is modeled by a Gaussian distribution with mean 𝒙 𝑔 and a variance that models how strong the likelihood is that 𝒙 𝑔 reflects 𝒙 𝑔∗ . The goal of the registration process is then to reduce the uncertainty of this likelihood over 𝒙 𝑔 to its lowest level, and thus give the most accurate prediction. Since this prediction is produced by combining the information from all particles and all previous iterations, it is applicable to different PSO methods with different velocity and position updating mechanisms. For the HMM described above, the Kalman filter [16] and its extensions [2, 17] can be regarded as solutions. When ℱ and ℋ are linear, and the HMM is therefore known as a linear dynamic system (LDS), the Kalman filter provides an efficient way to Fig. 5. Brief workflow of the unscented Kalman filter particle swarm optimizer (UKFPSO). ecursively estimate the state of this process while minimizing the mean square error [18]. The Kalman filter models the HMM as a predictor-corrector circle, where both the state-transition and observation are noisy processes with additive Gaussian noise. In our registration problem assuming a LDS in the prediction or time-update stage, a prediction of 𝒙 𝑔∗ (𝑡) is given by, 𝜃̂ − (𝑡) = 𝐅𝜃̂(𝑡 − 1) , (13) 𝚺 − (𝑡) = 𝐅𝚺(𝑡 − 1)𝐅 𝑇 + 𝚺 𝜃 , (14) where ℱ is the matrix representation of the state transition function, 𝜃 − (𝑡) and 𝚺 − (𝑡) are the mean and variance of predicted 𝒙 𝑔 (𝑡) respectively, and 𝚺 𝜃 is the covariance of the state-transition noise. Assuming 𝜃(𝑡) = 𝒙 𝑔∗ (𝑡) = 𝒙 𝑜 , 𝐅 is an identity matrix. Then in the correction , or measurement-update stage, the estimation of state is refined using the observation, 𝐊(𝑡) = (𝐅𝚺(𝑡 − 1)𝐅 𝑇 + 𝚺 𝜃 )𝐇 𝑇 𝐇(𝐅𝚺(𝑡 − 1)𝐅 𝑇 + 𝚺 𝜃 )𝐇 𝑇 + 𝚺 𝜉 , (15) 𝜃̂ = 𝜃̂ − (𝑡) + 𝐊(𝑡) (𝜉(𝑡) − 𝐇𝜃̂ − (𝑡)) , (16) 𝚺(𝑡) = (𝐈 − 𝐊(𝑡)𝐇)𝚺 − (𝑡) , (17) where the 𝐊(𝑡) is the Kalman gain in the t th iteration that is used to balance the influence of prediction and observation, 𝐇 is the observation matrix, which is identity, and 𝜃̂(𝑡) and 𝚺(𝑡) are the mean and variance of the estimation respectively. The estimate of global optimum is based on the following probability distribution [18],

𝒫(𝜃(𝑡)|𝜉(𝑡))~𝑁 (𝜃̂(𝑡), 𝚺(𝑡)) . (18)

This PSO model guided by Kalman filter (KF) under LDS assumption is named as LDS-KFPSO. V. T HE SPO-UKFPSO M ETHOD

When using the non-linear state transition model shown in Fig. 4, the HMM is not a LDS. In this case the non-linear extensions of the Kalman filter should be applied to deal with the non-linear state transition process 𝒙 𝑔∗ = ℱ(𝒙 𝑔∗ (𝑡 − 1)) . The extended Kalman filter (EKF) is the standard method for dealing with non-linear processes. However, it requires the calculation of a Jacobian matrix for ℱ(𝒙) [2], which is difficult for this complicated state transition function. Hence we propose the novel use of an unscented Kalman filter (UKF) [2]. Rather than estimate an arbitrary transition function as the EKF does, the UKF approximates a Gaussian probability distribution using standard vector and matrix operations based on a set of weighted sigma points, 𝜒(𝑡 − 1), 𝑗 = 1, ⋯ , 2𝐷 + 1 [19]. For the t th iteration in a D -dimensional problem space, the sample mean and covariance of the set of sigma points are 𝜃̂(𝑡 − 1) and 𝚺(𝑡 − 1) [19]. Specifically, the sigma points and their associated weights are selected by, 𝜒 𝑗 (𝑡 − 1) = { 𝜃̂(𝑡 − 1), 𝑗 = 0;𝜃̂(𝑡 − 1) + √(𝐷 + 𝜅)𝚺(𝑡 − 1),𝑗 = 1, ⋯ , 𝐷;𝜃̂(𝑡 − 1) − √(𝐷 + 𝜅)𝚺(𝑡 − 1), 𝑗 = 𝐷 + 1, ⋯ , 2𝐷 + 1; (19) 𝐖 𝑗 = { 𝜅 (𝐷 + 𝜅)⁄ , 𝑗 = 0,1 (2(𝐷 + 𝜅)), 𝑗 = 1, ⋯ , 2𝐷 + 1.⁄ (20) where 𝐖 𝑗 is the weight associated with the j th sigma point. Details of how to select the weighting parameter, 𝜅, can be found in [2] and [19]. In this work , we follow Uhlmann’s [19] recommendation that 𝜅 + 𝐷 = 3 . In the Kalman update stage each sigma point is instantiated through the state transition function by [19], 𝜒 𝑗 (𝑡|𝑡 − 1) = ℱ (𝜒 𝑗 (𝑡 − 1)) , (21) and then the mean of state prediction is calculated by [19]: 𝜃 − (𝑡) = ∑ 𝐖 𝑗 𝜒 𝑗 (𝑡|𝑡 − 1), (22) and the variance is given by [19], 𝚺 − (𝑡) = ∑ 𝐖 𝑗 = (𝜒 𝑗 (𝑡|𝑡 − 1) − 𝜃 − (𝑡)) ∙ (𝜒 𝑗 (𝑡|𝑡 − 1) − 𝜃 − (𝑡)) 𝑇 . (23) As the observation model is an identity function, we can still use the linear measurement update formulae of the original

Fig. 6 . Workflow of the unscented Kalman filter particle swarm optimizer with “shift particles observation” (SPO -UKFPSO). alman filter (given by equations (16-18)) in the correction stage to obtain 𝜃(𝑡) and

𝚺(𝑡) . Under this non-LDS assumption, since the uncertainty associated with the estimated global optimum is related to the distribution of particles, we can simply use either a sample, or all of the particles together with the estimated global optimum as the sigma points of UKF. This allows the number of sigma points to be greater than

2𝐷 + 1 , and makes integrating the UKF into the PSO more convenient. In addition to the traditional stopping criteria,

𝚺(𝑡) may be used as additional evidence of the convergence situation of the PSO. To sum up, the procedure of the PSO was combined with the predict-correct circle of the Kalman filter. For both LDS and non-LDS cases, our new UKF-PSO algorithm can be represented as shown in Fig. 5. The estimated global optimum, 𝒙̂ 𝑔 , will be affected by the relative location of the global optimum in the search range. The estimation is more accurate when the true global optimum is closer to the center of the search range. A slightly different observation can therefore be used to improve the estimated global optimum: in each iteration, after 𝒙̂ 𝑔 is calculated, all the particles are resampled to be 𝒙̃ 𝑖 , so that the searching range is centered on 𝒙̂ 𝑔 . Then a new average 𝒙̃ 𝑔 can be calculated as the observation, weighted by the new evaluations 𝑓̃(𝒙̃ 𝑖 ) . We name this model the “shift particles observation” UKFPSO (SPO -UKFPSO). In this case, the HMM will be different from the one used in the above UKFPSO method, with different definitions of 𝜃, 𝜃̂, 𝚺, ℱ, and 𝜉 . The workflow of the SPO-UKFPSO method is shown in Fig. 6. To apply the UKF guided PSO model to real image registration tasks, the choice of similarity measure also has a profound influence on the results. The chosen similarity measure has to follow the prior knowledge modeled by equation (8-10), which allows the problem to be solved as shown in Fig.1. For example, for a multi-modality registration problem, the sum of squared difference (SSD) of intensity is a poor choice. Therefore, we opt for the widely used mutual information (MI) instead. To register a reference image 𝝁 and a floating image 𝝂 , MI is calculated using their joint entropy 𝐻(𝝁, 𝝂) , and marginal entropies

𝐻(𝝁) and

𝐻(𝝂) , 𝑀𝐼(𝝁, 𝝂) = 𝐻(𝝁) + 𝐻(𝝂) − 𝐻(𝝁, 𝝂), (24) where MI makes registration a maximization problem. VI. T HE N ESTED

UKF-PSO Image registration can be performed using different types of similarity measures, as well as different features. In order to combine different features and measures we must assign a suitable weighting to each one, and normalize them to comparable scales. A benefit of the proposed model using prior knowledge, is that fitness values of any similarity measure are automatically normalized so as to be samples of a probability distribution, which maps all the measures to a uniform scale. As shown in Fig. 7, in the case where we have two similarity measures, 𝑓 (𝒙) and 𝑓 (𝒙) , the estimation of the global optimum output by a UKF using one measure can be intuitively considered as 𝜃 − of the second UKF associated with the other measure. The two UKFs share the same population of particles during the optimization process, which means that each particle obtains two fitness values in each iteration. The framework can be extended using multiple nested UKFs to allow any number of features or similarity measures to guide the optimization. VII. P ARTICLE S TATE E VOLUTION OF

UKF

GUIDED

PSO To sum up, the outputs of the KF or UKF in the three implementations of PSO above include the estimated hidden state 𝜃̂ , and a variance 𝚺, that reflects the estimation error. As discussed in sections III and IV, the accuracy of the estimation of the global optimum given by the weighted average (equation (11)) is dependent on the size of the search region, and the positioning of the true global optimum. Furthermore, the KF and its extensions generally behave like low-pass filters, which means high frequency information may be filtered out as well as the noise. In this case, a more reliable rapid model can be formulated by: 𝒗 𝑖 (𝑡 + 1) = 𝜔𝒗 𝑖 (𝑡) + 𝑐 𝑝 𝑟 𝑝 (𝒙 𝑖𝑝 − 𝒙 𝑖 (𝑡)) + 𝒄 𝑔 𝒓 𝑔 (𝒙 𝑔 − 𝒙 𝑖 (𝑡)) + 𝑐 𝜃 𝑟 𝜃 (𝜃̂ − 𝒙 𝑖 (𝑡)) , (25) where 𝑐 𝜃 is the acceleration constant weighting the attraction of the estimated hidden state output by the KF or UKF, and 𝑟 𝜃 is a randomly generated number drawn from the uniform distribution over the range (0, 1) . The component 𝑐 𝜃 𝑟 𝜃 (𝜃̂ −𝒙 𝑖 (𝑡)) introduced in equation (25) controls the influence of the estimated hidden state over the orientation of particles. The acceleration constants 𝑐 𝑝 , 𝑐 𝑔 and 𝑐 𝜃 need to be adjusted to Fig. 7. Workflow of nested unscented Kalman filter particle swarm optimizer (nested-UKFPSO). alance the influence of the personal optima 𝒙 𝑖𝑝 , the measured global optimum 𝒙 𝑔 , and the filtered optimum 𝜃̂ . Many methods initialise 𝑐 𝑝 and 𝑐 𝑔 as 2.0. In this work, 𝑐 𝑔 and 𝑐 𝜃 are initialized by letting 𝑐 𝑔 = 𝑐 𝜃 = 1, and during the particle evolution process they are adjusted by 𝑐 𝑔 = 𝑚𝑖𝑛(‖𝒙 𝑔 (𝑡) − 𝒙 𝑔 (𝑡 − 1)‖, 1.2) , (26) and 𝑐 𝑔 = 2 − 𝑐 𝜃 , (27) where 𝒙 𝜃 (𝑡) is the measured global optimum 𝒙 𝑔 obtained in the t th iteration. Particle positions are then updated using equation (3). VIII. E XPERIMENTS

The proposed PSO methods were evaluated on both general optimization and image registration problems. A few representative PSO methods previously used for registration are also chosen for comparison purposes. A. Benchmark Functions

The proposed PSO models were compared using some common benchmark functions widely used in the PSO literature [20], shown in table I. Since the optimization methods proposed in this paper are customized for image registration applications with the assumed prior knowledge described in section IV, we chose different types of benchmark functions, both single-objective and multi-objective, to comprehensively compare the power of the different PSO methods. As the nested UKFPSO method is specifically designed for image registration applications requiring multiple types of features or different types of similarity measures, it is not included in this benchmark function comparison. For image registration problems, it is more important to find a position that is closer to the real global optima in the search space than to search for a better value of the fitness function. The performances of the compared algorithms are therefore measured by the norm of the differences between their returned vectors and the ground truths of the benchmark functions. Since for most of the chosen benchmark functions the ground truth optima locate in the center of the search space, a weak optimization algorithm that tends to converge to the center of search space may obtain better results than others. To deal with this bias, while keeping the ground truth within the search space, we generated random shifts of the searching bounds, limited to be within 40% of the problem space. Besides the random shift of the search ranges, the algorithms were tested using a random problem dimension chosen between 2 to 30, and repeated for each algorithm 100 times for each benchmark function. The mean and standard deviation (STD) of each algorithm were calculated. The stop condition of the algorithms was either reaching 300 iterations or reduction of the variability of the particle positions around the global optima to be less than −6 . All of the algorithms were implemented in MATLAB (Mathworks, USA) with vectorized simulation of particle positions. Other than the particle position update mechanism, and some method specific parameters, all of the different implementations shared core code to ensure that the comparison was performed under similar circumstances. Accuracy, convergence speeds and the run times of each method were measured. Speeds were evaluated using the average number of iterations and function evaluations of each run, as well as the raw convergence time. For a general overview of the performances, the mean accuracy of each method over all benchmark functions was also calculated. TABLE I B

ENCHMARK F UNCTIONS

Function Name Ackley Griewank Modulus Sum Rastrigin 𝑓(𝒙) = −20 ∙ 𝑒𝑥𝑝 ( −0.2 ∙ √1𝐷 ∑ 𝑥 𝑑2𝐷𝑑=1 ) − 𝑒𝑥𝑝 (1𝐷 ∑ cos(2𝜋𝑥 𝑑 ) 𝐷𝑑=1 ) + 20 + 𝑒 𝑑2𝐷𝑑=1 − ∏ cos (𝑥 𝑑 √𝑑) 𝐷𝑑=1 + 1

60 + ∑|𝑥 𝑑 | 𝐷𝑑=1 𝑑2 − 10 ∙ cos(2𝜋𝑥 𝑑 )) 𝐷𝑑=1

Bounds [−30, 30] 𝐷 [−600, 600] 𝐷 [−5.12, 5.12] 𝐷 [−5.12, 5.12] 𝐷 Ground Truth (0, 0) 𝐷 (0, 0) 𝐷 (0, 0) 𝐷 (0, 0) 𝐷 Function Name Salomon Schwefel Rosenbrock Step 𝑓(𝒙) = 𝑑2𝐷𝑑=1 ) + 0.1√∑ 𝑥 𝑑2𝐷𝑑=1 𝑑 sin (√|𝑥 𝑑 |) 𝐷𝑑=1 ∑((𝑥 𝑑 − 1) + (𝑥 𝑑+1 − 𝑥 𝑑2 ) ∙ 100)

60 + ∑⌊𝑥 𝑑 ⌋ 𝐷𝑑=1

Bounds [−100, 100] 𝐷 [−500, 500] 𝐷 [−30, 30] 𝐷 [−5.12, 5.12] 𝐷 Ground Truth (0, 0) 𝐷 (420.968746, 420.968746) 𝐷 (1, 1) 𝐷 (−5.12, 5.12) 𝐷 The variable 𝒙 is a D -dimension vector with the form (𝑥 , 𝑥 , ⋯ , 𝑥 𝐷 ) . . Registering Benchmark Datasets

In order to evaluate the performances of the proposed PSO methods in real registration applications, we conducted a rigid registration experiment based on data from the multi-modality brain image datasets from the Retrospective Image Registration Evaluation (RIRE) Project [21]. The comparison includes the original PSO, the DRQPSO, the Bare Bones PSO, the Kalman filter PSO, LDS-KFPSO, SPO-UKFPSO and the nested UKFPSO methods. All methods use MI as the similarity measure, except for the nested UKFPSO, which used MI for measure 𝑓 (𝒙) and the gradient features proposed by Pluim et al. [22] were used as 𝑓 (𝒙) . We performed CT-MR_T2 and PET-MR_PD registration. The voxel size is for CT data, for MR_T2 and MR_PD data, and for PET data. As the purpose of this experiment is to compare the performance of different PSO methods in real image registration applications, rather than to obtain the absolute highest registration accuracy, we integrated the PSO methods into a very simple registration framework. For the sake of simplicity and efficiency, each slice of both the reference and floating volumes was down-sampled to 20% of the original in-plane resolution of the reference image along each dimension. The slice thickness of the floating volume was also interpolated to the slice thickness of the reference volume so that the optimization method only dealt with translation and rotation parameters. To allow further speed-up of the registration, we selected a cubic region of interest (ROI) in each volume by applying Otsu's histogram-based threshold selection method [23] to the normalized data. The RIRE project measures the accuracy of registration using TRE, calculated from multiple volumes of interest (VOIs). TRE is used as the measure of registration accuracy. The transformation parameters calculated from the resampled data are rescaled for transformation of the original volume. For each patient, 10 attempts at registration were completed, and in each run all methods use the same set of initialized particles that were generated by a MATLAB quasi-random number simulator. C. Registering Neonatal Datasets

To further compare the performance of our methods with the original PSO, we also conducted an experiment using neonatal data collected from a clinical trial performed at the Clinical Research Imaging Centre (CRIC), University of Edinburgh (UoE). This dataset has previously been used to evaluate the performance of the registration framework based on a rearranged histogram specification (RHS) and K-means binning [24]. We used images acquired at 38-

44 weeks’ postmenstrual age in natural sleep using a 3T Verio system (Siemens Healthcare Gmbh, Erlangen, Germany). Because of the neonatal age of the population being imaged, there is likely to be significant motion between acquisitions, which makes this dataset a good test of registration algorithms. Isotropic anatomical data were acquired with a range of contrasts, selected to facilitate the development of volumetric brain segmentation algorithms for the main study. Data from 10 patients were aligned using a rigid-body transform, calculated within a user-positioned ROI on volumes with an isotropic voxel size of . Transformation matrices were obtained from data down-sampled to half original resolution. Performance was evaluated by TREs, calculated from 1908 pairs of corresponding landmarks (18 on each volume), manually placed by a clinical expert. The accuracy of the LDS-KFPSO and the nested-UKFPSO are compared with the results from our earlier work based on the original PSO [24]. IX. R ESULTS A. Benchmark Functions

Table II shows the average minimization error of the different algorithms for each benchmark function (the STD of each run is shown within parenthesis). Table III summarizes the overall performances of the different algorithms. As shown in table II, the original PSO gave the best result for the Step function. The Bare Bones PSO performed better for the Griewank, Modulus Sum and Salomon functions. The proposed LDS-KFPSO method converged to positions that are closer to the true global optima for the Ackley Schewefel and Rosenbrock functions. For the majority of the benchmark functions, the proposed LDS-KFPSO and SPO-UKFPSO returned the best performances, or performances comparable to Bare Bones PSO. The Step function is a special case among all the benchmark functions, as it is increases monotonically, and the global optimum is located around the upper bound of the search range. In registration applications, this may happen when the true global optimum is not included in the search space. As expected, in this case, the LDS-UKFPSO and SPO-UKFPSO methods gave worse results.

TABLE

II P

ERFORMANCES OF THE

PSO M ETHODS A PPLIED TO THE C HOSEN B ENCHMARK F UNCTIONS

Function Original PSO QPSO RQPSO DRQPSO Chaotic PSO Bare bones PSO Kalman filter PSO LDS-KFPSO SPO-UKFPSO

Ackley 8.891(5.4) 6.859(4.1) 7.319(5.0) 5.991(4.4) 10.23(5.8) 9.070(5.5) 6.140(3.4)

Salomon 0.536(0.5) 4.897(4.6) 0.620(0.8) 0.932(1.2) 15.01(11.9) ased on the results shown in table III, due to the simplicity of its position update model, the implementation of chaotic QPSO has the fastest convergence time, but worst accuracy. In comparison, the LDS-KFPSO and SPO-UKFPSO take slightly longer to complete each iteration, but both required fewer iterations than other methods. In particular, the LDS-KFPSO used the least number of function evaluations, and had the shortest run time to achieve the best optimization results. The SPO-UKFPSO provided greater accuracy compared to LDS-KFPSO and converged quicker than most of the other methods. B. RIRE Data

The TREs for the CT-MR_T2 and PET-MR_PD registrations are shown in table IV. All three proposed PSO methods returned better results than the other methods in terms of mean and median TRE. Due to the combined features and similarity measures it utilizes, the nested-UKFPSO gave better results amongst the three proposed PSO models. For the Bare Bones PSO and Kalman filter PSO, since these methods feature a more deterministic position update mechanism, they display better convergence speed than the original PSO and DRQPSO. However, the original PSO and DRQPSO were highly sensitive to particle initialization, and gave the greatest variability in each run of the experiment. C. Neonatal Data

Fig. 8 displays the results of successfully registering the T2-w dark fluid and T1-w MRPAGE neonatal images using the UKFPSO methods. Registration of this particular dataset was only achieved using the UKFPSO method, previous methods had failed to register the shown example. The quantitative evaluation of these registration results are shown in table V. The LDS-KFPSO and nested UKFPSO therefore not only gave smaller TREs than the original PSO, but also successfully aligned one particular problematic dataset that our previous method failed to register [24]. X. C ONCLUSION

In this paper, we have described three new UKF-guided registration-oriented optimization implementations. The new PSO-based methods were evaluated using benchmark functions and by registering two medical image cohorts. Compared to the selected PSO algorithms, the UKF-guided PSO methods achieved more accurate registration results, and displayed better robustness to the presence of local optima. The convergence speed is comparable to the QPSO when minimizing benchmark functions, and is comparable to the original PSO algorithm when registering medical images. This new type of UKF-based PSO algorithm provides an efficient mechanism to encode prior knowledge of the search

TABLE

IV E

VALUATION OF THE

PSO M ETHODS A PPLIED TO

RIRE D ATA

Modality Function Original PSO DRQPSO Bare bones PSO Kalman filter PSO LDS-KFPSO SPO-UKFPSO SPO-UKFPSO

CT-MR_T2 Mean 6.2158 4.5297 10.3678 5.5092 3.5898 1.7407

Median 6.2047 4.4752 12.0473 5.6158 3.5980 1.8617

STD 2.2740 0.8503 3.6121 1.0642 1.0607 0.6932

Run Time 112.33s 97.69s 92.40s

Median 3.1755 3.5118 3.7254 6.1185 3.1472 3.1971

STD 1.0313 6.2657

III E

VALUATION OF THE

PSO M ETHODS A PPLIED TO THE C HOSEN B ENCHMARK F UNCTIONS

Function Original PSO QPSO RQPSO DRQPSO Chaotic PSO Bare bones PSO Kalman filter PSO LDS-KFPSO SPO-UKFPSO

Error Per Function 1.9917 3.0105 1.5960 1.4380 7.1051 1.8225 2.2148

Number of Iterations Per Run 148.85 88.29 65.373 138.01 46.31 144.96 93.02 (b) (c) (d)

Fig. 9. Registration results obtained using original particle swarm optimizer (PSO), the linear dynamic system Kalman filter PSO (LDS-KFPSO) and the nested unscented Kalman filter PSO (UKFPSO): (a) before registration; (b) registered using original PSO; (c) registered using LDS-KFPSO; (d) registered using nested UKFPSO. The registration is performed to align the T2 weighted dark fluid and T1 MRPAGE images. TABLE

V S

TATISTICS OF T ARGET R EGISTRATION E RRORS (TRE) Original PSO LDS-KFPSO nested-UKFPSO Mean 3.25 2.80

Median 1.88 1.93

STD 3.41 1.71

Number of Failures 1 0 Average Run Time 92.48s 100.19s 144.83s Errors measured in millimeter (mm). (b) (b) (c) (d)

Fig. 8. Registration results obtained using original particle swarm optimizer (PSO), the linear dynamic system Kalman filter PSO (LDS-KFPSO) and the nested unscented Kalman filter PSO (UKFPSO): (a) before registration; (b) registered using original PSO; (c) registered using LDS-KFPSO; (d) registered using nested UKFPSO. The registration is performed to align the T2 weighted dark fluid and T1 MRPAGE images visualized in overlapped red and green color channels. pace into the optimization process, without requiring manually assigned weights for each feature included in that prior knowledge. Unlike other PSO methods, the proposed methods update the probabilistic distribution of the whole search space, rather than storing the distribution for each particle. This process iteratively moves the particles close to the global optimum, especially in the early stage of PSO, thus leading to quicker convergence. Furthermore, the mechanism that updates the knowledge of the search space can also be applied to other population based optimization methods, for example, other swarm intelligence methods. Thus, it has great potential for application in a variety of medical image registration problems. R

EFERENCES [1] B. Zitová and J. Flusser, "Image registration methods: a survey,"

Image and Vision Computing, vol. 21, pp. 977-1000, 2003. [2] S. J. Julier and J. K. Uhlmann, "New extension of the Kalman filter to nonlinear systems," in

AeroSense'97 , ed, 1997, pp. 182-193. [3] P. Andras, "A Bayesian interpretation of the particle swarm optimization and its kernel extension.,"

PloS one, vol. 7, pp. e48710--e48710, 2011. [4] M. Wachowiak and R. Smolíková, "An approach to multimodal biomedical image registration utilizing particle swarm optimization,"

Evolutionary …, vol. 8, pp. 289-301, 2004. [5] R. C. Eberhart and J. Kennedy, "A new optimizer using particle swarm theory," in

Proceedings of the sixth international symposium on micro machine and human science vol. 1, ed, 1995, pp. 39-43. [6] R. Thangaraj, M. Pant, A. Abraham, and P. Bouvry, "Particle swarm optimization: Hybridization perspectives and experimental illustrations,"

Applied Mathematics and Computation, vol. 217, pp. 5208-5226, 2011. [7] S. Meshoul and M. Batouche, "A novel quantum behaved particle swarm optimization algorithm with chaotic search for image alignment,"

Evolutionary

Computation (CEC), …, vol. 4785468, pp. 10-15, 2010. [8] J. Kennedy and R. Mendes, "Neighborhood topologies in fully informed and best-of-neighborhood particle swarms,"

IEEE Transactions on Systems Man and Cybernetics Part C Applications and Reviews, vol. 36, p. 515, 2006. [9] W. Fang, J. Sun, Y. Ding, X. Wu, and W. Xu, "A Review of Quantum-behaved Particle Swarm Optimization,"

IETE Technical Review, vol. 27, p. 336, 2010. [10] D. Zhou, J. Sun, C.-H. Lai, W. Xu, and X. Lee, "An improved quantum-behaved particle swarm optimization and its application to medical image registration,"

International Journal of Computer Mathematics,

Optik - International Journal for Light and Electron Optics, vol. 123, pp. 1955-1960, 2012. [12] J. Robinson, S. Sinton, and Y. Rahmat-Samii, "Particle swarm, genetic algorithm, and their hybrids: optimization of a profiled corrugated horn antenna," in

Antennas and Propagation Society International Symposium, 2002. IEEE vol. 1, ed, 2002, pp. 314-317. [13] S.-K. S. Fan, Y.-c. Liang, and E. Zahara, "Hybrid simplex search and particle swarm optimization for the global optimization of multimodal functions,"

Engineering Optimization, vol. 36, pp. 401-418, 2004. [14] J. Kennedy, "Bare bones particle swarms,"

Proceedings of the 2003 IEEE Swarm Intelligence Symposium. SIS'03 (Cat. No.03EX706), vol. 729, 2003. [15] C. K. Monson and K. D. Seppi, "Bayesian optimization models for particle swarms," in

Proceedings of the 7th annual conference on Genetic and evolutionary computation , ed, 2005, pp. 193-200. [16] R. E. Kalman, "A new approach to linear filtering and prediction problems,"

Journal of Fluids Engineering, vol. 82, pp. 35-45, 1960. [17] G. A. Einicke and L. B. White, "Robust extended Kalman filtering,"

IEEE Transactions on Signal Processing, vol. 47, pp. 2596-2599, 1999. [18] R. E. Kalman, "An Introduction to Kalman Filter,"

University of North Carolina at Chapel Hill, Department of Computer Science, TR, pp. 41-95, 1995. [19] J. Uhlmann, S. Julier, and H. F. Durrant-Whyte, "A new method for the non linear transformation of means and covariances in filters and estimations,"

IEEE Transactions on automatic control, vol. 45, 2000. [20] M. Jamil and X.-S. Yang, "A literature survey of benchmark functions for global optimisation problems,"

International Journal of Mathematical Modelling and Numerical Optimisation, vol. 4, pp. 150-194, 2013. [21] J. West et al. , "Comparison and evaluation of retrospective intermodality brain image registration techniques.,"

Journal of computer assisted tomography, vol. 21, pp. 554-66, 1996. [22] J. P. W. Pluim, J. B. A. Maintz, and M. a. Viergever, "Image registration by maximization of combined mutual information and gradient information,"

IEEE Transactions on Medical Imaging, vol. 19, pp. 809-814, 2000. [23] N. Otsu, "A threshold selection method from gray-level histograms,"

Automatica, vol. 11, pp. 23-27, 1975. [24] C. Wang et al. , "Automatic multi-parametric MR registration method using mutual information based on adaptive asymmetric k-means binning," in

Biomedical Imaging (ISBI), 2015 IEEE 12th International Symposium on , ed, 2015, pp. 1089-1092., ed, 2015, pp. 1089-1092.