Unsupervised Multi Class Segmentation of 3D Images with Intensity Inhomogeneities
UUnsupervised Multi Class Segmentation of 3DImages with Intensity Inhomogeneities
Jan Henrik Fitschen ∗ Katharina Losch † Gabriele Steidl ∗ November 10, 2018
Abstract
Intensity inhomogeneities in images cause problems in gray-value basedimage segmentation since the varying intensity often dominates overgray-value differences of the image structures. In this paper we proposea novel biconvex variational model that includes the intensity inhomo-geneities to tackle this task. We combine a total variation approachfor multi class segmentation with a multiplicative model to handle theinhomogeneities. In our model we assume that the image intensityis the product of a smoothly varying part and a component whichresembles important image structures such as edges. Therefore, we pe-nalize in addition to the total variation of the label assignment matrixa quadratic difference term to cope with the smoothly varying fac-tor. A critical point of the resulting biconvex functional is computedby a modified proximal alternating linearized minimization method(PALM). We show that the assumptions for the convergence of the al-gorithm are fulfilled. Various numerical examples demonstrate the verygood performance of our method. Particular attention is paid to thesegmentation of 3D FIB tomographical images serving as a motivationfor our work.
Intensity inhomogeneities often occur in real-world images mainly due todifferent spatial lighting and deficiencies of imaging devices. For examplein MRI, imperfections in the radio-frequency coils or problems associatedwith acquisition sequences cause intensity changes. The motivation for thispaper was the task of segmenting 3D images stemming from focused ionbeam (FIB) tomography. While classical X -ray tomography does often notreach the required material resolution, FIB tomography enables to investi-gate the 3D morphology of structures on a scale down to several nanometers. ∗ Department of Mathematics, University of Kaiserslautern, Germany, { fitschen,steidl } @mathematik.uni-kl.de † Fraunhofer ITWM, Kaiserslautern, Germany a r X i v : . [ m a t h . NA ] M a y a) Exemplary slice of a 3D dataset with intensity inhomogeneities(b) Segmentation without consideringintensity inhomogeneities with clustercenters chosen such that the left part issegmented correctly. (c) Segmentation without consideringintensity inhomogeneities with clustercenters chosen such that the right partis segmented correctly. Figure 1: One slice of FIB data with varying illumination and two 3D seg-mentation results using a model not considering the illumination.The material is successively removed by a focused ion beam and after everysection, the surface is displayed by scanning electron microscopy. Severalhundred of these serial slices finally form a 3D image. A typical slice of a3D FIB tomography of aluminum with silicon carbide (SiC) particles (largerblack parts) and copper aggregations (small white parts) is shown in Fig. 1a.The segmentation has to distinguish between the particles, the aggregatesand the surrounding aluminum matrix. However, due to the intensity in-homogeneities, a segmentation based on the gray-values gives a very badresult. Fig. 1b and 1c show segmentation results for a 3D FIB data set us-ing a supervised gray-value based segmentation method without consideringthe illumination. Therefore, we have to choose one cluster center for each ofthe three classes, i.e., in total three gray-values that are close to the classesaluminum, SiC and copper, respectively. Unfortunately, the cluster centerscannot be chosen appropriate for all parts of the image so that either toomany or not enough particles are detected. Therefore the segmentation ofsuch images has to take the intensity inhomogeneities into account.2here are several techniques for illumination corrections in the literature.These methods could be used in a preprocessing step before applying a seg-mentation algorithm of choice. In particular in MRI, intensity correctionswere proposed by simple homomorphic filtering [12, 21] and polynomial,resp., spline surface fitting approaches [16, 31, 42, 17]. Many spatial illu-mination correction methods for natural images take hypotheses about theHuman Visual System (HVS) into account. In particular, the perceptualwork about the Retinex model [22] has found wide acceptance. It statesthat the HVS does not perceive an absolute lightness but rather a relative,local one. This phenomenon is called lateral inhibition. For example, Fig. 2shows the experiment of the checkerboard shadow illusion of Adelson [1].Although the squares A and B have the same gray-value, the perceived in-tensities are different. In the Retinex model, the light intensity F perceivedby the observer or camera is considered to be the product of the reflectanceof the objects R in the scene and the amount of source illumination L fallingon the objects, see also [18, p. 51], F ( x ) = R ( x ) L ( x ) , (1)where R ( x ) ∈ (0 ,
1) and L ( x ) ∈ (0 , + ∞ ). While we assume that the re-flectance inherits the structures of the objects, e.g. edges, we consider theillumination as spatially smooth, in particular it should not have sharp edges,which represent the image structures. Taking the logarithm in (1) we obtain f ( x ) = r ( x ) + l ( x ) , where f = log F , r = log R and l = log L . Retinex based variational or PDEbased approaches for illumination corrections can be found for example in[28, 30, 33, 34].In MRI the observed intensity is often modeled similarly as in (1) by theproduct of a structural part R and a so called gain factor L . In this paperwe also follow the multiplicative intensity model.In contrast to a two step procedure we consider the simultaneous segmen-tation and intensity inhomogeneity estimation. This avoids the computa-tional burden of two separate procedures and has moreover the advantageof being able to use intermediate information from the segmentation whileperforming the update. Besides statistical methods as the computationallyextensive EM approach in [43], variational based algorithms were proposedin [2, 3, 25, 26, 29, 36, 45]. We consider the later approaches in more detailin the next section.Variational segmentation models have shown a very good performance andflexibility in many applications. Level set methods and convex segmenta-tion models which penalize the (nonlocal) discrete total variation (TV) ofa relaxed label assignment matrix have been successfully applied [5, 11, 15,23, 24, 37, 40, 44]. 3 a) Original image (b) Segmentation result (c) Computed illumination Figure 2: Result for the “checkerboard” image.In this paper, we combine the TV based segmentation method with the mul-tiplicative intensity model. This results in a biconvex non smooth functional,which has to be optimized with respect to the label assignment matrix, thecluster centers and the smoothly varying intensity factor. Then we obtainboth a segmented image and an estimation of the intensity inhomogeneities.We compute a critical point of the corresponding functional by applyinga slight modification of the proximal alternating linearized minimizationmethod (PALM) by Bolte et al. [10].The paper is organized as follows: In Section 2 we review variational segmen-tation models which compensate for the varying intensities while segmentingthe image. Then we introduce our model. The modified PALM is explainedin Section 3. Section 4 contains numerical examples. The paper finisheswith conclusions in Section 5.
Let G := { , . . . , n } × . . . × { , . . . , n d } be a d -dimensional image grid. Herewe will deal with d = 2 and d = 3. Let n = n . . . n d be the number of imagepixels. We consider images F : G → R which we want to segment into K classes. We introduce a so-called label assignment matrix u := ( u k ( j )) j ∈G ,k ∈{ ,...,K } . Ideally we would like to have for a fixed pixel j ∈ G that u k ( j ) = 1 if itbelongs to class k and u k ( j ) = 0 otherwise. However, in the subsequentvariational approach this would lead to optimization tasks which are NPhard to solve. Therefore it is common to relax the assumptions on the labelassignment matrix and require only that u k ( j ) ∈ [0 ,
1] is the probability thatpixel j belongs to class k . Since every pixel should be assigned to one of theclasses, we enforce for every j ∈ G that K (cid:88) k =1 u k ( j ) = 1 .
4n other words ( u k ( j )) Kk =1 is an element of the probability simplex (cid:52) K := { v = ( v k ) Kk =1 ∈ [0 , K : K (cid:88) k =1 v k = 1 } . From the label assignment matrix the segmentation can be obtained byassigning, e.g., the label ˆ k := arg max k =1 ,...,K u k ( j )to j ∈ G .A variational model is composed of a data term, which incorporates infor-mation about the given image F and penalizing/regularizing terms, whichcontain prior information on the image. Let C := ( C , . . . , C K ) T denote theunknown vector of the centers of the K gray-value clusters. For segmenta-tion problems a typical data term is given by K (cid:88) k =1 (cid:88) j ∈G ( u k ( j )) p | F ( j ) − C k | , (2)where p ≥
1. For p = 1 this data term appears in the K (or C ) meansapproach, while p > K -means method [7, 8, 20, 32].Roughly speaking, the data term indicates the following: If F ( j ) is close tothe center C k , then the corresponding summand in (2) remains small for alarger u k ( j ). Thus the probability of F ( j ) being in the class k is high. If F ( j )is far away from the center C k , then the corresponding term in (2) becomesonly small if the probability u k ( j ) that F ( j ) belongs to class k is small.Clearly other distance measures between F ( j ) and C k than the squaredabsolute value can be chosen. Moreover, the approach can be generalizedto color images or more general image feature vectors F than gray-values.In this paper we restrict our attention to gray-value images. The abovedata term does not take intensity inhomogeneities into account. However,as shown in Fig. 1 it is often not possible to find appropriate class centersif the illumination varies. A remedy consists in incorporating the smoothlyvarying intensity part L into the model and consider E data ( u, C, L ) := K (cid:88) k =1 (cid:88) j ∈G ( u k ( j )) p | F ( j ) − L ( j ) C k | , (3) p ≥
1, or its logarithmic counterpart E data ( u, c, l ) := K (cid:88) k =1 (cid:88) j ∈G ( u k ( j )) p | f ( j ) − l ( j ) − c k | , (4)5here f = log F , c = log C and l = log L . From (4) we get the illuminationby L = exp( l ) and the class centers by C = exp( c ). The functional (3) istriconvex, i.e., fixing two of the variables u, C, L the functional is convex inthe third variable. In contrast, the functional (4) is only biconvex in u and v := ( c, l ). This means if we fix u , then the functional is convex in v andconversely.Within the penalizers pixel differences are often used since they indicatesmooth or higher frequent image parts as edges. By ∇ F we denote the d -dimensional discrete gradient of F , where we use forward differences ineach direction together with mirror boundary conditions. For the concretematrix structure of the gradient operator we refer to [40]. Similarly, ∇ F isthe discrete Hessian of F . Throughout this paper (cid:107) A (cid:107) denotes the squareroot of the sum of the squared entries of a multidimensional array A .Let us briefly review existing variational models for image segmentationwhich take intensity inhomogeneities into account. In [36] the followingmodel with data term (3) with p = 2 was suggested:arg min u,C,L (cid:8) E data ( u, C, L ) + + γ (cid:13)(cid:13)(cid:13) ∇ L (cid:13)(cid:13)(cid:13) + γ (cid:13)(cid:13)(cid:13) ∇ L (cid:13)(cid:13)(cid:13) (cid:9) (5)subject to ( u k ( j )) Kk =1 ∈ (cid:52) K , j ∈ G , where γ i > i = 1 , L via twoterms, but does not take the edges of the label assignment matrix u intoaccount. Therefore it is not robust to noise. Moreover it has the drawbackthat there does not exist a minimizer. This can be simply seen by setting L := F/r and C k := r with some constant r > γ r (cid:13)(cid:13)(cid:13) ∇ L (cid:13)(cid:13)(cid:13) + γ r (cid:13)(cid:13)(cid:13) ∇ L (cid:107) . For r → + ∞ we obtain a minimizingsequence of the functional which does not converge.Another segmentation model for MRI with data term (4) was proposed byAhmed et al. [2]:arg min u,C,L (cid:110) E data ( u, c, l )+ λ K (cid:88) k =1 (cid:88) j ∈G ( u k ( j )) p (cid:88) i ∈N j ( f ( i ) − l ( i ) − c k ) (cid:111) subject to ( u k ( j )) Kk =1 ∈ (cid:52) K , j ∈ G , < (cid:88) j ∈G u k ( j ) < n, k = 1 , . . . , K. Here λ > N j of the j -th pixel into account in order torespect the lateral inhibition. However, as in the previous model no penalizer6or the label assignment matrix is used, which makes the model sensitive tonoise. A similar model which penalizes the simplex constraint was suggestedin [29]. Unfortunately, the model formulation in [29] is mathematically notsound and the optimization procedure does not fit to the model.Li et al. [25, 26] considered a variational level set model, which reads in thecontinuous setting asarg min φ,C,L (cid:110) (cid:90) K (cid:88) k =1 e k ( x ) M k ( φ ( x )) dx (6)+ λ (cid:90) |∇ H ( φ ) | dx + γ (cid:90) ( |∇ φ | − dx (cid:111) with e k ( x ) := (cid:90) K σ ( y − x ) | F ( x ) − L ( y ) C k ) | dy. Here K σ is a truncated Gaussian kernel with standard deviation σ , H asmoothed Heaviside function and M k is a membership function. The modelwas applied for two-dimensional medical images. Note that the functionalis not convex in φ for fixed C and L . A more general level set approach isconsidered in [45]. Furthermore, a slightly different approach was proposedalso in [3].In this paper we consider the modelarg min u,c,l (cid:110) E data ( u, c, l ) + K (cid:88) k =1 λ k (cid:13)(cid:13)(cid:13) |∇ u k | (cid:13)(cid:13)(cid:13) + γ (cid:13)(cid:13)(cid:13) ∇ l (cid:13)(cid:13)(cid:13) (cid:111) (7)subject to ( u k ( j )) Kk =1 ∈ (cid:52) K , j ∈ G , where p = 1 in the data term. Here λ > γ > u k := ( u k ( j )) j ∈G is the k -th label assignment matrix. We have not found this model in theliterature. Using the indicator function of sets ι S ( x ) := (cid:26) x ∈ S, + ∞ otherwise , we can express the constraint “subject to ( u k ( j )) Kk =1 ∈ (cid:52) K , j ∈ G ” byadding ι (cid:52) nK ( u ) to the functional. Hence, the functional in (7) can be writtenas E ( u, c, l ) = E data ( u, c, l ) + K (cid:88) k =1 λ k (cid:13)(cid:13)(cid:13) |∇ u k | (cid:13)(cid:13)(cid:13) (8)+ γ (cid:13)(cid:13)(cid:13) ∇ l (cid:13)(cid:13)(cid:13) + ι (cid:52) nK ( u ) .
7s in (5) we penalize a non smooth illumination l , but just by one quadraticfirst order difference term. The matrix u k is penalized by its discrete totalvariation (TV) [39], where |∇ u k | denotes the length of the discrete gradientof u k and the (cid:96) norm is taken over the grid points j ∈ G . This termencourages smooth edges of the segmented objects and makes the modelmore robust to noise. We allow the choice of different parameters λ k tobetter segment objects of different size. The functional (7) is biconvex in u and ( c, l ) and possesses a minimizer as the following proposition states. Proposition 1.
The functional E ( u, c, l ) defined by (8) has a minimizer.Proof. It suffices to show that there exists a bounded infimal sequence since E ( u, c, l ) is lower semi-continuous and bounded from below.Let (( u, c, l ) r ) r ∈ N , ( u, c, l ) r ∈ (cid:52) nK × R K × R n , be an infimal sequence, i.e., E ( u r , c r , l r ) → inf ( u,c,l ) E ( u, c, l ) as r → ∞ . Without loss of generality, we can assume the following: If there exists asubsequence ( u r i k ) i ∈ N for some k ∈ , . . . , K with (cid:88) j ∈G u r i k ( j ) → i → ∞ , (9)then the whole sequence converges to zero, in other words (cid:88) j ∈G u rk ( j ) → i → ∞ . (10)If this was not the case, we could restrict the following analysis to thesequence (( u, c, l ) r i ) i ∈ N and possibly repeat the procedure for the next k ful-filling (9). Note that it is not possible that (10) holds for all k ∈ { , . . . , K } since u r ∈ (cid:52) nK .Since the sequence is assumed to be an infimal sequence, we immediatelyobtain by the constraint on u r that ( u r ) r ∈ N is bounded.Next we consider l . Due to mirror extension at the boundary, we have thatker ∇ = { a n : a ∈ R } , where n is the vector consisting of n entries 1, i.e., the kernel consists ofconstant images. We decompose l into two parts l r = l r ker n + l r ker ⊥ , where l r ker ⊥ ∈ ker( ∇ ) ⊥ and l r ker n ∈ ker( ∇ ) with l r ker ∈ R . Using again thatthe sequence is an infimal one, we see due to the term (cid:107)∇ l (cid:107) that ( l r ker ⊥ ) r ∈ N is bounded. 8ow define the (possibly empty) set K := k : (cid:88) j ∈G u rk ( j ) → r → ∞ ⊂ { , . . . , K } and fix ˆ k (cid:54)∈ K . We show that a bounded infimal sequence is given by((˜ u, ˜ c, ˜ l ) r ) r ∈ N , where ˜ u rk := if k ∈ K ,u rk + (cid:80) i ∈K u ri if k = ˆ k,u rk otherwise , ˜ c rk := (cid:40) k ∈ K ,c rk + l r ker otherwise , ˜ l r := l r − l r ker n = l r ker ⊥ . We immediately obtain that (˜ u r ) r ∈ N is bounded as ( u r ) r ∈ N is bounded.Further, (˜ l r ) r ∈ N = ( l r ker ⊥ ) r ∈ N is bounded. Next, we show that (˜ c r ) r ∈ N isbounded. From the data term (4), we conclude that( (cid:88) j ∈G u rk ( j ) | f ( j ) − l r ( j ) − c rk | ) r ∈ N is bounded for all k = 1 , . . . , K . Let k (cid:54)∈ K . By our assumption that (9)implies (10), we know that there also does not exist a subsequence ( r i ) i ∈ N with (cid:80) j ∈G u r i k ( j ) → i → ∞ . Thus, for k (cid:54)∈ K , there must exist a seriesof pixels ( j rk ) r ∈ N such that u rk ( j rk ) ≥ ε > r sufficiently large. Hence,( | f ( j rk ) − l r ( j rk ) − c rk | ) r ∈ N = ( | f ( j rk ) − l r ker ⊥ ( j rk ) − ˜ c rk | ) r ∈ N is bounded. Consequently, (˜ c rk ) r ∈ N is bounded for all k (cid:54)∈ K . Together with˜ c rk = 0 for k ∈ K , we obtain that (˜ c r ) r ∈ N is bounded. Thus, the wholesequence ((˜ u, ˜ c, ˜ l ) r ) r ∈ N is bounded.It remains to show that it is an infimal one. Note that adding l r ker to c rk andsubtracting it from l r ( j ) does not change the objective value.By construction, the constraint ˜ u r ∈ (cid:52) nK is still fulfilled. Thus, the onlychange in the objective value arises for the summands belonging to k ∈ K and ˆ k . But since ˜ u r ˆ k − u r ˆ k → as r → ∞ and (cid:88) j ∈G u rk ( j ) → r → ∞ , k ∈ K , the objective value does not change in the limit such that the sequence((˜ u, ˜ c, ˜ l ) r ) r ∈ N is still an infimal sequence. This finishes the proof.9 emark 2. Alternatively to (7) we could deal with the non logarithmicmodel arg min u,C,L (cid:110) E data ( u, C, L ) + K (cid:88) k =1 λ k (cid:13)(cid:13)(cid:13) |∇ u k | (cid:13)(cid:13)(cid:13) + γ (cid:13)(cid:13)(cid:13) ∇ L (cid:13)(cid:13)(cid:13) (cid:111) subject to ( u k ( j )) Kk =1 ∈ (cid:52) K , j ∈ G . We prefer to work with the biconvex functional (7) , although we have ob-tained similar numerical results by minimizing the above triconvex func-tional.
In this section we deal with the minimization of our functional (8). First wemention that fixing ( c, l ) we get E ( c,l ) ( u ) := K (cid:88) k =1 (cid:88) j ∈G u k ( j ) | f ( j ) − l ( j ) − c k | + K (cid:88) k =1 λ k (cid:13)(cid:13)(cid:13) |∇ u k | (cid:13)(cid:13)(cid:13) + ι (cid:52) nK ( u ) , which is convex, but not strictly convex. Fixing u we obtain the convexfunctional E u ( c, l ) := K (cid:88) k =1 (cid:88) j ∈G u k ( j ) | f ( j ) − l ( j ) − c k | + γ (cid:13)(cid:13)(cid:13) ∇ l (cid:13)(cid:13)(cid:13) . This functional becomes strictly convex if we assume that we have onlynonempty classes, i.e., (cid:88) j ∈G u k ( j ) ≥ ε > l to be zero. In other words,˜ E u ( c, l ) := K (cid:88) k =1 (cid:88) j ∈G u k ( j ) | f ( j ) − l ( j ) − c k | (11)+ γ (cid:13)(cid:13)(cid:13) ∇ l (cid:13)(cid:13)(cid:13) + ι { } ( (cid:104) , l (cid:105) )is strictly convex in c and l separately due to the quadratic term. It isfurthermore jointly convex in ( c, l ) by the term ι { } ( (cid:104) , l (cid:105) ).10or an algorithm which computes in an alternating way u ( r ) ∈ arg min u E ( c ( r ) ,l ( r ) ) ( u )( c ( r +1) , l ( r +1) ) ∈ arg min ( c,l ) E u ( r ) ( c, l )one can prove similarly as in [6] that the resulting sequence { u ( r ) , c ( r ) , l ( r ) } r has a subsequence which converges to a partial minimizer of (8). For analternating convex search algorithm for general biconvex problems we referto [19].In this paper we want to apply an algorithm which ensures the convergenceof the whole sequence { u ( r ) , c ( r ) , l ( r ) } r . To this end we need the definition ofthe proximal mapping. For a proper, lower semi-continuous, convex function h : R N → ( −∞ , + ∞ ] and λ >
0, the proximal operator prox λh : R N → R N is defined by prox λh ( x ) := arg min y ∈ R N λ (cid:107) x − y (cid:107) + h ( y ) . Indeed, the minimizer of the right-hand side exists and is uniquely deter-mined, see, e.g., [38].Recently, Bolte, Sabach and Teboulle [10] considered problems of the formarg min x =( x ,...,x m ) (cid:110) m (cid:88) i =1 h i ( x i ) + H ( x ) (cid:111) , (12)where h i : R N i → ( −∞ , + ∞ ], i = 1 , . . . , m , are proper lower semi-continuousfunctions and H : R N × . . . × R N m → R is continuously differentiable. Fur-ther, (cid:80) mi =1 h i ( x i ) + H ( x ) needs to be a Kurdyka-(cid:32)Lojasiewicz (KL) function.For a discussion of KL functions, we refer to [4, 9, 10]. Here, we only wantto emphasize that semi-algebraic functions are KL and our setting fits tothe examples for semi-algebraic functions given in [10, Sect. 5]. The authorssuggested Algorithm 1 for such problems.Under certain assumptions it was shown that the sequence { x ( r ) } r convergesto a critical point of the functional in (12). Theorem 3.
Let h i : R N i → ( −∞ , + ∞ ] , i = 1 , . . . , m be proper lower semi-continuous functions and H : R N × . . . × R N m → R a continuously differ-entiable function such that ∇ H is Lipschitz continuous on bounded sets.Let h i , i = 1 , . . . , m and (cid:80) mi =1 h i + H be bounded from below. Further, let (cid:80) mi =1 h i ( x i ) + H ( x ) be a KL function. Assume that the sequence of iterates { x ( r ) } r produced by PALM is bounded. Further suppose that for i = 1 , . . . , m ,the partial gradients ∇ x i H ( x ) are globally Lipschitz for fixed ( x , . . . , x i − , x i +1 , . . . , x m )11 lgorithm 1: PALM for m blocksInput: functions h i : R N i → ( −∞ , + ∞ ], i = 1 , . . . , m , H : R N × . . . × R N m → R step-sizes τ ri , i = 1 , . . . , m , r = 0 , , . . . Initialization: x (0) i ∈ R N i , i = 1 , . . . , m For r = 0 , , ... do x ( r +1) i ∈ prox τ ri h i (cid:18) x ( r ) i − τ i,r ∇ x i H ( x ( r +1)1 , . . . , x ( r +1) i − , x ( r ) i , . . . , x ( r ) m ) (cid:19) ,i = 1 , . . . , m. with Lipschitz constant L i ( x , . . . , x i − , x i +1 , . . . , x m ) , where L i (cid:16) ( x ( r )1 , . . . , x ( r ) i − , x ( r ) i +1 , . . . , x ( r ) m ) (cid:17) ≤ λ + i for all r ∈ N and some λ + i ∈ R . Then, for τ i,r ≥ L i (cid:16) ( x ( r )1 , . . . , x ( r ) i − , x ( r ) i +1 , . . . , x ( r ) m ) (cid:17) , i = 1 , . . . , m, the sequence { x ( r ) } r converges to a critical point of the functional in (12) . For minimizing (8) we apply PALM with m = 3 blocks and the followingsetting h ( u ) := K (cid:88) k =1 λ k (cid:13)(cid:13) |∇ u k | (cid:13)(cid:13) + ι (cid:52) nK ( u ) , (13) h ( c ) := 0 ,h ( l ) := ι { } ( (cid:104) , l (cid:105) ) ,H ( u, c, l ) := (cid:88) j,k u k ( j )( f ( j ) − l ( j ) − c k ) + γ (cid:107)∇ l (cid:107) . More precisely, H ( u, c, l ) + h ( u )+ h ( c ) + h ( l )= E ( u, c, l ) + ι { } ( (cid:104) N , l (cid:105) ) , h , whichenforces the logarithmic illumination to have mean value zero, see (11).Note that adding a constant C ∈ R to l ( j ) for all j ∈ G , and subtractingthis constant from the c k , k = 1 , . . . , K does not change the value of thefunctional (8). Therefore we have an ambiguity in the solution consisting ofa constant function. By introducing h we enforce that l has mean zero, i.e.,we fix the constant to C := −(cid:104) , l (cid:105) /n such that the ambiguity is removed.Of course it would also be possible to use PALM with m = 2 blocks byhandling ( c, l ) together. However, this leads to joint estimates for the Lip-schitz constant in the partial gradient of H with respect to ( c, l ) which ismore restrictive than the Lipschitz constants from the separate gradientswith respect to c and l , see also proof of Corollary 4.In PALM we need the following partial gradients of H : ∇ u H ( u, c, l ) = (cid:0) ( f ( j ) − l ( j ) − c k ) (cid:1) j,k , (14) ∇ c H ( u, c, l ) = 2 (cid:88) j u k ( j )( c k + l ( j ) − f ( j )) k , ∇ l H ( u, c, l ) = 2 (cid:32)(cid:88) k u k ( j )( l ( j ) + c k − f ( j )) (cid:33) j + 2 γ ∇ ∗ ∇ l. Based on these gradients PALM reads for our setting as follows:
Algorithm 2:
PALM for (13).Input: step-sizes τ ri ∈ R > , i = 1 , , r = 0 , , . . . Initialization: u (0) ∈ R nK , c (0) ∈ R K , l (0) ∈ R n For r = 0 , , ... do u ( r +1) = prox τ ,r h (cid:18) u ( r ) − τ ,r ∇ u H ( u ( r ) , c ( r ) , l ( r ) ) (cid:19) ,c ( r +1) = c ( r ) − τ ,r ∇ c H ( u ( r +1) , c ( r ) , l ( r ) ) ,l ( r +1) = arg min l ι { } ( (cid:104) , l (cid:105) ) + 12 (cid:107) l ( r ) − τ ,r ∇ l H ( u ( r +1) , c ( r +1) , l ( r ) ) − l (cid:107) . The first proximum prox τ ,r h is not given analytically, but can be computedby several methods. Here we use the primal dual algorithm proposed in[14]. The second step is just a gradient descent step. The last proximum13s a projection of a := l ( r ) − τ ,r ∇ l H ( u ( r +1) , c ( r +1) , l ( r ) ) onto the hyperplane { l : (cid:104) , l (cid:105) = 0 } and can be computed by subtracting from a its mean value. Corollary 4.
For the functions h i , i = 1 , , and H defined in (13) and τ ,r > , τ ,r ≥ n, τ ,r ≥ dγ, r = 0 , , . . . , Algorithm 2 converges to a critical point of (8) .Proof.
We have to check that the assumptions of Theorem 3 are fulfilled.It is easy to check that H ( u, c, l ) + h ( u ) + h ( c ) + h ( l ) is a semi-algebraicfunction and therefore a KL function, see the examples in [10, Sect. 5].The functions h , h , h are lower semi-continuous and bounded from below.Since H is twice continuously differentiable, its gradient is Lipschitz onbounded sets and by u ∈ (cid:52) nK the function is also bounded from below.Let us consider the partial gradients of H given by (14). Since ∇ u H ( u, c, l )does not depend on u , we get immediately L ( c, l ) = 0. For the gradientwith respect to c it follows with u ∈ (cid:52) nK that (cid:107)∇ c H ( u, c , l ) − ∇ c H ( u, c , l ) (cid:107) = (cid:107) (cid:0) (cid:88) j u k ( j )( c − c ) (cid:1) k (cid:107) ≤ n (cid:107) c − c (cid:107) . Finally, ∇ l H ( u, c, l ) is Lipschitz since (cid:107)∇ l H ( u, c, l ) − ∇ l H ( u, c, l ) (cid:107) = (cid:107) (cid:0) (cid:88) k u k ( j )( l ( j ) − l ( j )) (cid:1) j + 2 γ ∇ ∗ ∇ ( l − l ) (cid:107) ≤ C ( u ) (cid:107) l − l (cid:107) , where C ( u ) := 2 max k,j {| u k ( j ) |} + 2 γ (cid:107)∇ ∗ ∇(cid:107) . The d -dimensional Laplacian ∇ ∗ ∇ has a spectral norm smaller than 4 d .This can be seen as follows. In 1D one simply has the forward differencematrix with mirror boundary conditions ∇ = D n := − − ∈ R n ,n . ∇ ∗ ∇ are 4 sin ( πj n ), j = 0 , . . . , n −
1, see, e.g., [41].Hence, (cid:107)∇ ∗ ∇(cid:107) <
4. In d dimensions, after reshaping the d -dimensionalimage into a vector, the gradient becomes ∇ = D ... D d , D i := I b i ⊗ D n i ⊗ I a i where a i := Π i − j =1 n j , b i := Π dj = i +1 n j and ⊗ denotes the tensor product ofmatrices. Then ∇ ∗ ∇ = d (cid:88) i =1 I b i ⊗ D T n i D n i ⊗ I a i . Since the eigenvalues of A ⊗ B are given by the products of the eigenvaluesof A and B , one obtains (cid:107)∇ ∗ ∇(cid:107) ≤ (cid:80) di =1 (cid:107) D T n i D n i (cid:107) ≤ d . Since u ( r ) ∈ (cid:52) nK ,we see that L ( u ( r ) , c ( r ) ) ≤ dγ .Finally, the sequence of iterates produced by the algorithm is bounded bythe following arguments. By adding the constraint to l , i.e., T l = 0, theresulting functional becomes coercive. Together with the decrease propertyfor the PALM iterates [10, Lemma 3.3] one immediately gets the bounded-ness. In this section we demonstrate the performance of our algorithm. The algo-rithm was implemented in MATLAB with the following general parametersetting: • Lipschitz constants in PALM: According to the Lipschitz constants wecan choose any τ r >
0. In practice τ r needs to be chosen rather smallto achieve an acceptable convergence speed, here we set τ ,r := 10 − .Further we set τ ,r := n . Although this choice does not fulfill therestrictions given by the Lipschitz constants, we observed numericalconvergence. In particular compared to a smaller stepsize that ful-fills the Lipschitz condition, we got the same results in a faster way.Finally, we set τ ,r := 2 + 8 dγ according to the Lipschitz condition. • Initialization: The illumination l is initialized as a very smooth versionof the input image obtained by convolving the input image with aGaussian kernel of large standard deviation σ displayed in the captionof the images. The codebook c is initialized with a rough initial guess,and u with constant entries K . We observed that the initialization of u and c is not important. For reasonable initial values the results werealmost equal. 15 Iteration number: In all our experiments we performed 2000 (outer)iterations of PALM. In each outer iteration, we applied 50 (inner) iter-ations of the primal dual method for computing the proximal operatorin the first step of PALM. If we would apply the inner PDHG itera-tions on their own, usually more than 50 iterations would be necessary.Here, we observed that 50 inner iterations are sufficient since the ini-tialization is already close to the solution after some outer iterations. • Regularization parameters: If not stated otherwise, the other param-eters, i.e., λ k , k = 1 , . . . , K and γ were chosen according to the bestvisual impression and are stated in the captions of the correspondingfigures.We compare our approach with the following three other methods:M1) Model (6) proposed in [26] for 2D images, where we used the pro-gram package of the authors available at .M2) We apply the pure segmentation modelarg min (cid:110) K (cid:88) k =1 (cid:88) j ∈G u k ( j ) | f ( j ) − c k | + λ K (cid:88) k =1 (cid:107)∇ u k (cid:107) , (cid:111) subject to u ( j ) ∈ (cid:52) K , j ∈ G , proposed, e.g., in [40]. This model does not take care of illumina-tion changes. We use this method for comparison to show that it isnecessary to include the illumination in the model.M3) We apply a two step procedure. In the first step we use the initial-ization described above to estimate the illumination and correct theimage by dividing it by the illumination. In the second step we applymodel M2) to the corrected image.We start with an artificial 2D image as a ground truth experiment. Next weapply our algorithm to 2D medical images, in particular CT images, sincethis kind of images was used to test the algorithm M1) in [26]. Finally,we consider 3D FIB tomography data which was the motivation for thiswork. Since some of the images have pixels with value zero, we add a smallconstant to avoid problems when taking the logarithm of such images. Artificial 2D Images
We start with the artificial 2D image with varyingillumination in Fig. 3. The figure shows the noise-free case. A three classsegmentation by M2) leads to completely wrong results. Our model is ableto segment this image exactly and extract the proper illumination l . Similarresults can be obtained by the method M1) [26].16igure 3: From left to right: original image, segmented image by M2) (with-out illumination correction), segmented image with our method ( λ = λ =0 . γ = 100, σ = 30) and the computed illumination by our model.Figure 4: From left to right: noisy image ( s = 0 . λ = 0 . c ( λ = 0 .
5) and M1).Next, we add Gaussian noise of standard deviation s and compare the seg-mentation of our model and M1). For our model we optimized the TVparameter λ = λ = λ for each noise-level by a grid search according tothe smallest number of wrongly assigned pixels, and took γ = 100 as in thenoise-free case. In the model (6) from [26] we set σ = 7 which also providedthe best result in the noise-free case. As in our model, the parameter λ was optimized by a grid search for each noise-level. Additionally, we op-timized the parameter γ by a grid search. Fig. 5 shows the percentage ofwrongly assigned pixels depending on s for our model and M1). Up to thenoise level s = 0 .
004 the proposed method clearly outperforms M1). For s = 0 .
005 however, one class vanishes due to the noise so that the number ofwrongly assigned pixels increases significantly. To avoid this the third center c , which belongs to the vanishing class, can be fixed. Then the proposedmethod outperforms M1) also for the higher noise levels. Fig. 4 depicts theresults for the noise level s = 0 . Medical Images
Next we present results for different medical images.The images shown in Fig. 6 were taken from [25, 27].The first row shows results for an X-ray image of bones. The result ofour model is compared to the result of M1) with the parameters proposedin the paper [26], i.e., σ = 4 , γ = 1 and λ = 0 . · . The whitepart corresponds to positive values of the resulting function of the level set17 0 .
001 0 .
002 0 .
003 0 .
004 0 .
005 0 .
006 0 . s w r o n g p i x e l s i n % M1), Li et al.proposedproposed, fixed c Figure 5: Percentage of wrongly assigned pixels depending on the noise levelusing the model M1), the proposed method and the proposed method withfixed third entry of the codebook.method M1). Furthermore, we depict the result of the two-step methodM3) to show that correcting the illumination only in the preprocessing stepbefore the segmentation is not sufficient. The methods M1) and M3) performequally well, where each one shows different slight artifacts. The proposedmodel gives the best segmentation result, in particular the left bone and theupper part of the right bone are segmented correctly.The second and third row show results for two CT angiography images ofvessels. It is clearly necessary to incorporate the illumination since partsof the images differ widely in their brightness. For the first vessel imagethe parameters proposed in [26] were used. For the second vessel we set σ = 10 , γ = 10. The first segmented vessel by our method shows slightlythicker structures. In the image center the segmentation method M1) there-fore gives better results. In the lower part M1) leads to too thin structure.Here our method performs better. M3) leads to the worst results in thisexample. Many thin structures are not corrected and some additional arti-facts are visible. For the second vessel all three methods give good results.However, M1) and M3) show some small artifacts that are not visible withthe proposed method.
3D FIB Tomography Images
Next we are interested in the segmenta-tion of 3D images stemming from FIB tomography. We consider 3D dataof an aluminum matrix composite. More precisely, the material consists ofthree phases which we want to segment: the first phase is aluminum, thesecond one consists of silicon carbide (SiC) particles and the third one arecopper aggradations. The aluminum particles are darker than the surround-ing aluminum matrix. The copper aggradations are visible as small brightspots.In Fig. 7 and 8 two 3D data sets of this material with different particle sizesare examined. The necessity to take the illumination into account when18igure 6: From left to right: original image, segmented image by our method,results by M1) and M3). First row: X-ray image of bones ( λ = λ = 0 . γ = 25, σ = 20), second row: CTA image of a vessel ( λ = λ = 0 . γ = 25, σ = 20), third row: CTA image of a vessel ( λ = λ = 0 . γ = 50, σ = 10). 19 a) Exemplary slice (b) Segmented slice(c) 3D visualization of the SiC segment (d) 3D visualization of the copper seg-ment Figure 7: Segmentation of 3D FIB tomography data with three classes byour method ( λ = 0 . , λ = 0 . , λ = 0 . λ ) and a larger one for the particles( λ ) to segment both correctly. Furthermore we fixed c , which correspondsto the copper aggradations, to avoid that this small segment vanishes similaras in Fig. 4. We proposed a novel biconvex model for segmentation of images with in-tensity inhomogeneities. that combines a total variation based approach forsegmentation with a multiplicative model for the intensity inhomogeneities.This results in a simultaneous segmentation and intensity correction algo-rithm without preprocessing. For the minimization of the resulting func-tional we used the PALM algorithm for which we could prove convergence20 a) Exemplary slice (b) Segmented slice(c) 3D visualization of the SiC segment (d) 3D visualization of the copper seg-ment
Figure 8: Segmentation of 3D FIB tomography data with three classes byour method ( λ = 0 . , λ = 0 . , λ = 0 . u k ( j ) ∈ { , } , see [35].The incorporation of the illumination into a segmentation framework withadditional linear operators as, e.g., blur, see [13] appears also to be useful.Finally, we want to handle other than gray-valued images. Acknowledgement.
Funding by the German Research Foundation (DFG)within the Research Training Group 1932 ”Stochastic Models for Innovationsin the Engineering Sciences”, project area P3, is gratefully acknowledged.The authors thank K. Schladitz (Fraunhofer ITWM, Kaiserslautern) andF. Balle, T. Beck and S. Schuff (Department of Mechanical and ProcessEngineering, University of Kaiserslautern) for fruitful discussions. We arethankful to T. L¨ober (Nano Structuring Center Kaiserslautern) for creatingFIB/SEM data of the aluminum matrix composite used in Figure 1, 7 and8.
References [1] E. H. Adelson. Checkershadow illusion.
Available athttp://persci.mit.edu/gallery/checkershadow , 1995.[2] M. Ahmed, S. Yamany, N. Mohamed, A. Farag, and T. Moriarty. Amodified fuzzy c-means algorithm for bias field estimation and segmen-tation of MRI data.
IEEE Transactions on Medical Imaging , 21(3):193–199, 2002.[3] H. Ali, N. Badshah, K. Chen, and G. A. Khan. A variational modelwith hybrid images data fitting energies for segmentation of images withintensity inhomogeneity.
Pattern Recognition , 51:27 – 42, 2016.[4] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. Proximal alternat-ing minimization and projection methods for nonconvex problems: Anapproach based on the Kurdyka-(cid:32)Lojasiewicz inequality.
Mathematicsof Operations Research , 35(2):438–457, 2010.[5] E. Bae, J. Yuan, and X.-C. Tai. Global minimization for continuousmultiphase partitioning problems using a dual approach. CAM Report09-75, UCLA, 2009.[6] J. Bezdek, R. Hathaway, M. Sabin, and W. Tucker. Convergence theoryfor fuzzy c -means: Counterexamples and repairs. IEEE Transactionson Systems, Man, and Cybernetics , 17(5):873–877, 1987.227] J. C. Bezdek. A convergence theorem for the fuzzy isodata cluster-ing algorithms.
IEEE Transactions on Pattern Analysis and MachineIntelligence , 2(1):1–8, 1980.[8] J. C. Bezdek.
Pattern Recognition with Fuzzy Objective Function Algo-rithms . Plenum Press, New York, 1981.[9] J. Bolte, A. Daniilidis, O. Ley, and L. Mazet. Characterizations of(cid:32)Lojasiewicz inequalities: subgradient flows, talweg, convexity.
Transac-tions of the American Mathematical Society , 362(6):3319–3363, 2010.[10] J. Bolte, S. Sabach, and M. Teboulle. Proximal alternating linearizedminimization for nonconvex and nonsmooth problems.
MathematicalProgramming , 146(1-2):459–494, 2014.[11] X. Bresson and T. F. Chan. Non-local unsupervised variational imagesegmentation models. Technical report, 2008.[12] B. H. Brinkmann, A. Manduca, and R. A. Robb. Optimized homo-morphic unsharp masking for mr grayscale inhomogeneity correction.
IEEE Transactions on Medical Imaging , 17(2):161–171, 1998.[13] X. Cai, R. Chan, and T. Zeng. A two-stage image segmentation methodusing a convex variant of the mumford-shah model and thresholding.
SIAM Journal on Imaging Sciences , 6(1):368–390, 2013.[14] A. Chambolle and T. Pock. A first-order primal-dual algorithm forconvex problems with applications to imaging.
Journal of MathematicalImaging and Vision , 40(1):120–145, 2011.[15] T. Chan and L. Vese. Active contours without edges.
IEEE Transac-tions on Image Processing , 10(2):266–277, 2001.[16] B. M. Dawant, A. P. Zijdenbos, and R. A. Margolin. Correction of in-tensity variations in MR images for computer-aided tissue classification.
IEEE Transactions on Medical Imaging , 12(4):770–781, 1993.[17] S. Gilles, M. Brady, J. Declerck, J.-P. Thirion, and N. Ayache. Biasfield correction of breast MR images. In
Visualization in BiomedicalComputing , pages 153–158. Springer, 1996.[18] C. Gonzalez and R. E. Woods.
Digital Image Processing . Pearson,Prentice Hall, 2008.[19] J. Gorski, F. Pfeuffer, and K. Klamroth. Biconvex sets and optimiza-tion with biconvex functions - a survey and extensions.
MathematicalMethods of Operations Research , 66(3):373–407, 2007.2320] Y. He, M. Y. Hussaini, J. Ma, B. Shafei, and G. Steidl. A fuzzy c -meansmethod with total variation regularization for image segmentation. Pat-tern Recognition , 45(9):3463–3471, 2012.[21] B. Johnston, M. S. Atkins, B. Mackiewich, and M. Anderson. Segmen-tation of multiple sclerosis lesions in intensity corrected multispectralMRI.
IEEE Transactions on Medical Imaging , 15(2):154–169, 1996.[22] E. H. Land and J. J. McCann. Lightness and retinex theory.
Journalof the Optical Society of America , 61(1):1–11, 1971.[23] J. Lellmann, F. Becker, and C. Schn¨orr. Convex optimization for multi-class image labeling with a novel family of total variation based regu-larizers. In
International Conference on Computer Vision . 2009.[24] J. Lellmann, J. Kappes, F. Becker, and C. Schn¨orr. Convex multi-class image labeling with simplex-constrained total variation. In X.-C.Tai, K. Morken, M. Lysaker, and K.-A. Lie, editors,
Scale Space andVariational Methods, volume 5567 of LNCS , volume 5567 of
LectureNotes in Computer Science , pages 150–162. Springer, 2009.[25] C. Li, R. Huang, Z. Ding, C. Gatenby, D. Metaxas, and J. Gore. A vari-ational level set approach to segmentation and bias correction of imageswith intensity inhomogeneity. In D. Metaxas, L. Axel, G. Fichtinger,and G. Sz´ekely, editors,
Medical Image Computing and Computer-Assisted Intervention – MICCAI 2008 , volume 5242 of
Lecture Notes inComputer Science , pages 1083–1091. Springer Berlin Heidelberg, 2008.[26] C. Li, R. Huang, Z. Ding, J. Gatenby, D. Metaxas, and J. Gore. Alevel set method for image segmentation in the presence of intensityinhomogeneities with application to MRI.
IEEE Transactions on ImageProcessing , 20(7):2007–2016, 2011.[27] C. Li, C.-Y. Kao, J. Gore, and Z. Ding. Implicit active contours drivenby local binary fitting energy. In
IEEE Conference on Computer Visionand Pattern Recognition , pages 1–7, 2007.[28] J. Liang and X. Zhang. Retinex by higher order total variation l decom-position. Journal of Mathematical Imaging and Vision , 52(3):345–355,2015.[29] L. Ma, R. C. Staunton, and W. Wang. Modified fuzzy c -means im-age segmentation algorithm for use with uneven illumination patterns. Pattern Recognition , 40(11):3005–3011, 2007.[30] W. Ma, J.-M. Morel, S. Osher, and A. Chien. An L1-based variationalmodel for retinex theory and its application to medical images. In24
EEE Conference on Computer Vision and Pattern Recognition , pages153–160, 2011.[31] C. R. Meyer, P. H. Bland, and J. Pipe. Retrospective correction of inten-sity inhomogeneities in MRI.
IEEE Transactions on Medical Imaging ,14(1):36–41, 1995.[32] S. Miyamoto and Y. Agusta. Efficient algorithms for l p fuzzy c-meansand their termination properties. In Proc. 5th Conference of the Inter-national Federation Classification Society , pages 255–258, Kobe, Japan,1996.[33] J. Morel, A. Petro, and C. Sbert. A PDE formalization of retinex theory.
IEEE Transactions on Image Processing , 19(11):2825–2837, 2010.[34] M. K. Ng and W. Wang. A total variation model for retinex.
SIAMJournal on Imaging Sciences , 4(1):345–365, 2011.[35] M. Nikolova, S. Esedoglu, and T. F. Chan. Algorithms for finding globalminimizers of image segmentation and denoising models.
SIAM Journalon Applied Mathematics , 66(5):1632–1648, 2006.[36] D. L. Pham and J. L. Prince. An adaptive fuzzy C-means algorithmfor image segmentation in the presence of intensity inhomogeneities.
Pattern Recognition Letters , 20(1):57 – 68, 1999.[37] T. Pock, A. Chambolle, D. Cremers, and H. Bischof. A convex relax-ation approach for computing minimal partitions. In
IEEE Conferenceon Computer Vision and Pattern Recognition , pages 810–817. 2009.[38] R. T. Rockafellar.
Convex Analysis . Princeton University Press, Prince-ton, 1970.[39] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation basednoise removal algorithms.
Physica D , 60(1):259–268, 1992.[40] B. Shafei and G. Steidl. Segmentation of images with separating layersby fuzzy c-means and convex optimization.
Journal of Visual Commu-nication and Image Representation , 23(4):611–621, 2012.[41] G. Strang and S. MacNamara. Functions of difference matrices areToeplitz plus Hankel.
SIAM Review , 56:525–546, 2014.[42] M. Tincher, C. Meyer, R. Gupta, and D. Williams. Polynomial model-ing and reduction of RF body coil spatial inhomogeneity in MRI.
IEEETransactions on Medical Imaging , 12(2):361–365, 1993.2543] W. Wells, W. Grimson, R. Kikinis, and F. Jolesz. Adaptive segmenta-tion of MRI data.
IEEE Transactions on Medical Imaging , 15(4):429–442, 1996.[44] C. Zach, D. Gallup, J.-M. Frahm, and M. Niethammer. Fast globallabeling for real-time stereo using multiple plane sweeps.
Vision, Mod-eling, and Visualization Workshop , 2008.[45] K. Zhang, L. Zhang, K. M. Lam, and D. Zhang. A level set approach toimage segmentation with intensity inhomogeneity.