Going beyond p-convolutions to learn grayscale morphological operators
Alexandre Kirszenberg, Guillaume Tochon, Elodie Puybareau, Jesus Angulo
GGoing beyond p-convolutions to learn grayscalemorphological operators
Alexandre Kirszenberg , Guillaume Tochon , ´Elodie Puybareau , and JesusAngulo EPITA Research and Development Laboratory (LRDE), Le Kremlin-Bicˆetre, France [email protected] Centre for Mathematical Morphology, Mines ParisTech, PSL Research University,France [email protected]
Abstract.
Integrating mathematical morphology operations within deepneural networks has been subject to increasing attention lately. How-ever, replacing standard convolution layers with erosions or dilations isparticularly challenging because the min and max operations are not dif-ferentiable. Relying on the asymptotic behavior of the counter-harmonicmean, p-convolutional layers were proposed as a possible workaroundto this issue since they can perform pseudo-dilation or pseudo-erosionoperations (depending on the value of their inner parameter p ), andvery promising results were reported. In this work, we present two newmorphological layers based on the same principle as the p-convolutionallayer while circumventing its principal drawbacks, and demonstrate theirpotential interest in further implementations within deep convolutionalneural network architectures. Keywords: morphological layer, p-convolution, counter-harmonic mean,grayscale mathematical morphology.
Mathematical morphology deals with the non-linear filtering of images [15]. Theelementary operations of mathematical morphology amount to computing theminimum (for the erosion) or maximum (for the dilation) of all pixel valueswithin a neighborhood of some given shape and size (the structuring element) ofthe pixel under study. Combining those elementary operations, one can definemore advanced (but still non-linear) filters, such as openings and closings, whichhave many times proven to be successful at various image processing tasks suchas filtering, segmentation or edge detection [18]. However, deriving the optimalcombination of operations and the design (shape and size) of their respectivestructuring element is generally done in a tedious and time-consuming trial-and-error fashion. Thus, delegating the automatic identification of the right sequenceof operations to use and their structuring element to some machine learningtechnique is an appealing strategy. a r X i v : . [ ee ss . I V ] F e b A. Kirszenberg et al.
On the other hand, artificial neural networks are composed of units (or neurons)connected to each other and organized in layers. The output of each neuron isexpressed as the linear combination of its inputs weighed by trainable weights,potentially mapped by a non-linear activation function [6]. Convolutional neuralnetworks (CNNs) work in a similar fashion, replacing neurons with convolutionalfilters [8].Because of the similarity between their respective operations, there has been anincreasing interest in past years to integrate morphological operations within theframework of neural networks, and two major lines of research have emerged.The first one, tracing back to the end of the 80s, replaces the multiplicationand addition of linear perceptron units with addition and maximum [4, 14, 20],resulting in so-called non-linear morphological perceptrons [19] (see [3, 21] forrecent works in this domain). The second line, mainly motivated by the riseof deep CNNs, explores the integration of elementary morphological operationsin such networks to automatically learn their optimal shape and weights, themajor issue being that the min and max operations are not differentiable. Afirst workaround is to replace them by smooth differentiable approximations,making them suited to the conventional gradient descent learning approach viaback-propagation [8]. In their seminal work, Masci et al. [10] relied on someproperties of the counter-harmonic mean [2] (CHM) to provide p-convolutional(
P Conv ) layers, the value of the trainable parameter p dictating which of theelementary morphological operation the layer ultimately mimicks. The CHM wasalso used as an alternative to the standard max-pooling layer in classical CNNarchitectures [11]. LogSumExp functions (also known as multivariate softplus)were proposed as replacements of min and max operations to learn binary [17]and grayscale [16] structuring elements. An alternative approach was followedin [5, 12]: the non-linear morphological operations remained unchanged, and thebackpropagation step was instead adapted to handle them in the same way theclassical max-pooling layer is handled in standard CNNs. Finally, morphologi-cal operations were recreated and optimized as combinations of depthwise andpointwise convolution with depthwise pooling [13].Looking at all recently proposed approaches (apart from [10], all other afore-mentionned works date back to no later than 2017) and the diversity of theirevaluation (image classification on MNIST database [9], image denoising andrestoration, edge detection and so on), it seems to us that the magical formulafor integrating morphological operations within CNNs has yet to be derived. Forthis reason, we would like to draw the attention in this work back to the P Conv layer proposed in [10]. As a matter of fact, very promising results were reportedbut never investigated further. Relying on the CHM framework, we proposetwo possible extensions to the
P Conv layer, for which we demonstrate potentialinterest in further implementations within deep neural network architectures.In section 2, we review the work on p-convolutions [10], presenting their mainadvantages, properties and limitations. In section 3, we propose two new mor-phological layers, namely the L Morph layer (also based on the CHM) and the S Morph layer (based on the regularized softmax). Both proposed layers are oing beyond p-convolutions 3 compatible with grayscale mathematical morphology and nonflat structuringelements. In Section 4, we showcase some results from our implementations, andproceed to compare these results to those of the p-convolution layer. Finally,Section 5 provides a conclusion and some perspectives from our contributions.
In this section, we detail the notion of p-convolution as presented in [10].
In mathematical morphology, an image is classically represented as a 2D function f : E → R with x ∈ E being the pixel coordinates in the 2D grid E ⊆ Z and f ( x ) ∈ R being the pixel value. In grayscale mathematical morphology, i.e. whenboth the image f and the structuring element b are real-valued (and not binary),erosion f (cid:9) b and dilation f ⊕ b operations can be written as:( f (cid:9) b )( x ) = inf y ∈ E { f ( y ) − b ( x − y ) } (1)( f ⊕ b )( x ) = sup y ∈ E { f ( y ) + b ( x − y ) } (2)This formalism also encompasses the use of flat (binary) structuring elements,which are then written as b ( x ) = (cid:40) x ∈ B −∞ otherwise , (3)where B ⊆ E is the support of the structuring function b . Let p ∈ R . The counter-harmonic mean (CHM) of order p of a given non negativevector x = ( x , . . . , x n ) ∈ ( R + ) n with non negative weights w = ( w , . . . , w n ) ∈ ( R + ) n is defined as CHM ( x , w , p ) = (cid:80) ni =1 w i x pi (cid:80) ni =1 w i x p − i . (4)The CHM is also known as the Lehmer mean [2]. Asymptotically, one haslim p → + ∞ CHM ( x , w , p ) = sup i x i and lim p →−∞ CHM ( x , w , p ) = inf i x i .The p-convolution of an image f at pixel x for a given (positive) convolutionkernel w : W ⊆ E → R + is defined as: P Conv ( f, w, p )( x ) = ( f ∗ p w )( x ) = ( f p +1 ∗ w )( x )( f p ∗ w )( x ) = (cid:80) y ∈ W ( x ) f p +1 ( y ) w ( x − y ) (cid:80) y ∈ W ( x ) f p ( y ) w ( x − y ) (5) where f p ( x ) denotes the pixel value f ( x ) raised at the power of p , W ( x ) is thespatial support of kernel w centered at x , and the scalar p controls the type of A. Kirszenberg et al. operation to perform.Based on the asymptotic properties of the CHM, the morphological behavior ofthe
P Conv operation with respect to p has notably been studied in [1]. Moreprecisely, when p > p < p → ∞ (resp. −∞ ), the largest (resp. smallest) pixelvalue in the local neighborhood W ( x ) of pixel x dominates the weighted sum (5)and the P Conv ( f, w, p )( x ) acts as a non-flat grayscale dilation (resp. a non-flatgrayscale erosion) with the structuring function b ( x ) = p log ( w ( x )):lim p → + ∞ ( f ∗ p w )( x ) = sup y ∈ W ( x ) (cid:26) f ( y ) + 1 p log ( w ( x − y )) (cid:27) (6)lim p →−∞ ( f ∗ p w )( x ) = inf y ∈ W ( x ) (cid:26) f ( y ) − p log ( w ( x − y )) (cid:27) (7)In practice, equations (6) and (7) hold true for | p | >
10. The flat structuringfunction (3) can be recovered by using constant weight kernels, i.e. , w ( x ) = 1 if x ∈ W and w ( x ) = 0 if x (cid:54)∈ W and | p | (cid:29)
0. As stated in [10], the
P Conv operationis differentiable, thus compatible with gradient descent learning approaches viaback-propagation.
In order for the
P Conv layer to be defined on all its possible input parameters, w and f must be strictly positive. Otherwise, the following issues can occur: – If f ( x ) contains null values and p is negative, f p ( x ) is not defined; – If f ( x ) contains negative values and p is a non-null, non-integer real number, f p ( x ) can contain complex numbers; – If w ( x ) or f p ( x ) contain null values, f p ∗ w )( x ) is not defined.As such, before feeding an image to the p-convolution operation, it first must berescaled between [1 , f r ( x ) = 1 . f ( x ) − min x ∈ E f ( x )max x ∈ E f ( x ) − min x ∈ E f ( x ) (8)Moreover, if several P Conv layers are concatenated one behind the other (toachieve (pseudo-) opening and closing operations for instance), a rescaling mustbe performed before each layer. Particular care must also be taken with theoutput of the last
P Conv layer, since it must also be rescaled to ensure thatthe range of the output matches that of the target. This is done by adding atrainable scale/bias 1x1x1 convolution layer at the end of the network.Last but not least, a notable drawback of the
P Conv layer when it comes tolearning a specific (binary or non-flat) structuring element is that it tends to behollow and flattened out in the center (see further presented results in Section 4). oing beyond p-convolutions 5 L Morph and S Morph layers
As exposed in the previous section 2.3, the
P Conv layer has a few edge cases anddrawbacks. For this reason, we now propose two new morphological layers, basedupon the same fundamental principle as the
P Conv operation, with the intentof making them compatible with general grayscale mathematical morphology. L Morph operation
Our main objective is still to circumvent the non-differentiability of min and maxfunctions by replacing them with smooth and differentiable approximations. Inthe previous section 2, we presented the CHM, whose asymptotic behavior isexploited by the
P Conv layer [10]. Relying on this behavior once more, we nowpropose to define the following L Morph (for L ehmer-mean based Morphological)operation: L Morph( f, w, p )( x ) = (cid:80) y ∈ W ( x ) ( f ( y ) + w ( x − y )) p +1 (cid:80) y ∈ W ( x ) ( f ( y ) + w ( x − y )) p (9)where w : W → R + is the structuring function and p ∈ R . Defined as such, we canidentify L Morph( f, w, p ) with the CHM defined by the equation (4): all weights w i (resp. entries x i ) of equation (4) correspond to 1 (resp. f ( y ) + w ( x − y )) inthe equation (9), from which we can deduce the following asymptotic behavior:lim p → + ∞ L Morph( f, w, p )( x ) = sup y ∈ W ( x ) { f ( y ) + w ( x − y ) } = ( f ⊕ w )( x ) (10)lim p →−∞ L Morph( f, w, p )( x ) = inf y ∈ W ( x ) { f ( y ) + w ( x − y ) } = ( f (cid:9) − w )( x ) (11)By changing the sign of p , one can achieve either pseudo-dilation (if p >
0) orpseudo-erosion (if p < L Morphfunction with a given non-flat structuring element for different values of p . Inpractice | p | >
20 is sufficient to reproduce non-flat grayscale dilation or non-flatgrayscale erosion. Note however that the applied structuring function is − w inthe case of an erosion.Relying on the CHM like the P Conv layer brings over some shared limitations:the input image f must be positive and rescaled following equation (8), and thestructuring function w must be positive or null. S Morph operation
Deriving a morphological layer based on the asymptotic behavior of the CHMhas a major drawback in that the input must be rescaled within the range [1 , α -softmaxfunction [7], which is defined as: S α ( x ) = (cid:80) ni =1 x i e αx i (cid:80) ni =1 e αx i (12) A. Kirszenberg et al. -1.00-0.40 p = 0 p = 10 p = 20 p = 30 p = -0 -0.370.51 p = -10 -0.98-0.14 p = -20 -1.00-0.31 p = -30 -1.00-0.36
Fig. 1: Top row: input image, non-flat structuring element, target dilation, targeterosion. Middle row: L Morph pseudo-dilation for increasing value of p . Bottomrow: L Morph pseudo-erosion for increasing value of | p | .for some x = ( x , . . . , x n ) ∈ R n and α ∈ R . In fact, S α has the desired propertiesthat lim α → + ∞ S α ( x ) = max i x i and lim α →−∞ S α ( x ) = min i x i . This function isless restrictive than the CHM since it does not require the elements of x to bestrictly positive. A major benefit is that it is no longer necessary to rescale itsinput.Exploiting this property, we define in the following the S Morph (standing forSmooth Morphological) operation: S Morph( f, w, α )( x ) = (cid:80) y ∈ W ( x ) ( f ( y ) + w ( x − y )) e α ( f ( y )+ w ( x − y )) (cid:80) y ∈ W ( x ) e α ( f ( y )+ w ( x − y )) , (13)where w : W → R plays the role of the structuring function. We can see fromthe properties of S α that the following holds true:lim α → + ∞ S Morph( f, w, α )( x ) = ( f ⊕ w )( x ) (14)lim α →−∞ S Morph( f, w, α )( x ) = ( f (cid:9) − w )( x ) (15)As such, just like the P Conv and L Morph layers, the proposed S Morph operationcan alternate between a pseudo-dilation ( α >
0) and a pseudo-erosion ( α < α (cid:29) α (cid:28) S Morph function with a givennon-flat structuring element for different values of α . We can see that, as | α | increases, the operation better and better approximates the target operation. oing beyond p-convolutions 7 -1.00-0.26 = 0 = 5 = 20 = 30 = -0 -0.370.36 = 10 -0.99-0.13 = 20 -1.00-0.23 = 30 -1.00-0.25 Fig. 2: Top row: input image, non-flat structuring element, target dilation, targeterosion. Middle row: S Morph pseudo-dilation for increasing value of α . Bottomrow: S Morph pseudo-erosion for increasing value of | α | . cross3 cross7 disk2 disk3 diamond3 complex Fig. 3: 7 × In the following, we evaluate the ability of the proposed L Morph and S Morphlayers to properly learn a target structuring element and compare with resultsobtained by the
P Conv layer. To do so, we apply in turn dilation ⊕ , erosion (cid:9) ,closing • and opening ◦ to all 60000 digits of the MNIST dataset [9], withthe target structuring elements displayed by Figure 3. For dilation and erosion(resp. closing and opening), each network is composed of a single (resp. two)morphological layer(s) followed by a scale/bias Conv
P Conv and L Morphnetworks, the image also has to be rescaled in the range [1 ,
2] before passingthrough the morphological layer. We train all networks with a batch size of 32,optimizing for the mean squared error (MSE) loss with the Adam optimizer (withstarting learning rate η = 0 . A. Kirszenberg et al. p = . p = . p = . p = . p = . p = . p = − . p = − . p = − . p = − . p = − . p = − . PConv ⊕(cid:9) p = . p = . p = . p = . p = . p = . p = − . p = − . p = − . p = − . p = − . p = − . L Mor ph ⊕(cid:9) α = . α = . α = . α = . α = . α = . α = − . α = − . α = − . α = − . α = − . α = − . ⊕(cid:9) S Mor ph
Fig. 4: Learned structuring element (with corresponding p or α at convergence)for P Conv , L Morph and S Morph layers on dilation ⊕ and erosion (cid:9) tasks.to decrease by a factor of 10 when the loss plateaus for 5 consecutive epochs.Convergence is reached when the loss plateaus for 10 consecutive epochs. For the P Conv layer, the filter is initialized with 1s and p = 0. For L Morph, the filter isinitialized with a folded normal distribution with standard deviation σ = 0 . p = 0. For the S Morph layer, the filter is initialized with a centered normaldistribution with standard deviation σ = 0 .
01 and α = 0. In all instances, thetraining is done simultaneously on the weights and the parameter p or α .In order to assess the performance of the morphological networks for all scenarios(one scenario being one morphological operation ⊕ , (cid:9) , • and ◦ and one targetstructuring element among those presented by Figure 3), we computed the rootmean square error (RMSE) between the filter learned at convergence and thetarget filter. The loss at convergence as well as the value of the parameter p or α also serve as quantitative criteria. oing beyond p-convolutions 9 Table 1: MSE loss at convergence and RMSE between the learned structuringelement displayed by Figure 4 and the target for
P Conv , L Morph and S Morphlayers on dilation ⊕ and erosion (cid:9) tasks. Best (lowest) results are in bold. cross3 cross7 disk2 disk3 diamond3 complex × − × − LOSSRMSELOSSRMSE ⊕(cid:9)
PConv × − × − LOSSRMSELOSSRMSE ⊕(cid:9) L Morph × − × − LOSSRMSELOSSRMSE ⊕(cid:9) S Morph
Figure 4 gathers the structuring elements learned by the
P Conv , L Morph and S Morph layers for dilation and erosion, and the value of their respective parameter.Looking at the sign of the parameter, all three networks succeed at finding thecorrect morphological operation. The magnitude of the parameter at convergencealso confirms that the operation applied by all networks can be considered asdilation or erosion (and not simply pseudo-dilation or pseudo-erosion). However,looking at the shape of the learned structuring element, it is clear that the
P Conv layer suffers from the hollow effect mentionned in section 2.3, while both L Morphand S Morph layers accurately retrieve the target structuring element. This isconfirmed by the RMSE values between those structuring elements and theirrespective targets, as presented by Table 1. More particularly, L Morph alwaysachieves the lowest RMSE value for dilation tasks, while S Morph succeeds betteron erosion. In any case, the loss at convergence of the S Morph network is almostconsistently lower than that of the L Morph network by one or two orders ofmagnitude, and by two to three with respect to the
P Conv network.Figure 5 displays the structuring elements learned by the
P Conv , L Morph and S Morph layers for all 6 target structuring elements for closing and openingoperations. This time, since each network comprises of two morphological layers(and a scale/bias
Conv p or α of opposite signs, once training converges.As can be seen on Figure 5, the P Conv network always succeeds at learning theright morphological operation: the first (resp. second) layer converges to p > . − . . − . . − . . − . . − . . − . − . . − . . − . . − . . − . . − . . pp PConv •◦ . − . . − . . − . . − . . − . . − . . − . − . . − . − . − . . − . . − . . pp L Morph •◦ − . − . . − . − . − . . − . . − . . − . − . − . − . . − . . − . . − . . − . . αα S Morph •◦ Fig. 5: Learned structuring elements (with corresponding p or α value for eachlayer) for P Conv , L Morph and S Morph layers on closing • and opening ◦ tasks.(resp. p <
0) for the closing, and the opposite behavior for the opening. However,quite often | p | <
10, indicating that the layer is applying pseudo-dilation orpseudo-erosion only. In addition, the learned structuring element suffers againfrom the hollow effect for the opening, and does not find the correct shape for theclosing. The L Morph network succeeds in learning the correct operation and shapefor the closing operation with large target structuring elements (all but cross3 and disk2 ). For the opening operation however, it consistently fails at retrieving theshape of the target structuring element. This counter-performance is up to nowunexplained. The S Morph network also struggles with small structuring elementsfor both the opening and closing, but perfectly recovers large ones. The edge caseof small target structuring elements could come from the scale/bias
Conv
Conv p or α not converging toward the correct sign domain.Table 2 presents the MSE loss at convergence and RMSE value between the learnedfilters and the target structuring elements for closing and opening scenarios.Except for the aforementioned edge case, the S Morph layer consistently achievesthe lowest loss value and RMSE for opening, while the best results for closingare obtained either by L Morph or S Morph layers. Overall, apart from smallstructuring elements for closing or opening operations, the proposed S Morphlayer outperforms its
P Conv and L Morph counterparts. Last but not least, itshould also be noted that the S Morph layer is also numerically more stable. As a oing beyond p-convolutions 11
Table 2: MSE loss at convergence and RMSE between the learned structuringelements displayed by Figure 5 and the target for
P Conv , L Morph and S Morphlayers on closing • and opening ◦ tasks. Best (lowest) results are in bold. cross3 cross7 disk2 disk3 diamond3 complex | | | | | | × − | | | | | | × − LOSSRMSELOSSRMSE •◦ PConv | | | | | | × − | | | | | | × − LOSSRMSELOSSRMSE •◦ L Morph | | | | | | × − | | | | | | × − LOSSRMSELOSSRMSE •◦ S Morph matter of fact, raising to the power of p in the P Conv and L Morph layers fasterinduces floating point accuracy issues.
We present two new morphological layers, namely L Morph and S Morph. Similarlyto the
P conv layer of Masci et al. [10], the former relies on the asymptoticproperties of the CHM to achieve grayscale erosion and dilation. The latterinstead relies on the α -softmax function to reach the same goal, thus sidesteppingsome of the limitations shared by the P Conv and L Morph layers (namely beingrestricted to strictly positive inputs, rescaled in the range [1 , S Morph layer overall outperforms both the
P Conv and L Morph layers.Future work includes investigating the edge cases uncovered for both proposedlayers, as well as integrating them into more complex network architectures andevaluating them on concrete image processing applications.
References
1. Angulo, J.: Pseudo-morphological image diffusion using the counter-harmonicparadigm. In: International Conference on Advanced Concepts for Intelligent VisionSystems. pp. 426–437. Springer (2010)2 A. Kirszenberg et al.2. Bullen, P.S.: Handbook of means and their inequalities, vol. 560. Springer Science& Business Media (2013)3. Charisopoulos, V., Maragos, P.: Morphological perceptrons: geometry and trainingalgorithms. In: International Symposium on Mathematical Morphology and ItsApplications to Signal and Image Processing. pp. 3–15. Springer (2017)4. Davidson, J.L., Ritter, G.X.: Theory of morphological neural networks. In: DigitalOptical Computing II. vol. 1215, pp. 378–388. International Society for Optics andPhotonics (1990)5. Franchi, G., Fehri, A., Yao, A.: Deep morphological networks. Pattern Recognition , 107246 (2020)6. Hassoun, M.H., et al.: Fundamentals of artificial neural networks. MIT press (1995)7. Lange, M., Z¨uhlke, D., Holz, O., Villmann, T., Mittweida, S.G.: Applications ofLp-Norms and their Smooth Approximations for Gradient Based Learning VectorQuantization. In: ESANN. pp. 271–276 (2014)8. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature (7553), 436–444(2015)9. LeCun, Y., Cortes, C., Burges, C.J.: The MNIST database of handwritten digits.URL http://yann. lecun. com/exdb/mnist (34), 14 (1998)10. Masci, J., Angulo, J., Schmidhuber, J.: A learning framework for morphologicaloperators using counter–harmonic mean. In: International Symposium on Math-ematical Morphology and Its Applications to Signal and Image Processing. pp.329–340. Springer (2013)11. Mellouli, D., Hamdani, T.M., Ayed, M.B., Alimi, A.M.: Morph-cnn: a morphologicalconvolutional neural network for image classification. In: International Conferenceon Neural Information Processing. pp. 110–117. Springer (2017)12. Mondal, R., Dey, M.S., Chanda, B.: Image restoration by learning morphologicalopening-closing network. Mathematical Morphology-Theory and Applications (1),87–107 (2020)13. Nogueira, K., Chanussot, J., Dalla Mura, M., Schwartz, W.R., Santos, J.A.d.: Anintroduction to deep morphological networks. arXiv preprint arXiv:1906.01751(2019)14. Ritter, G.X., Sussner, P.: An introduction to morphological neural networks. In:Proceedings of 13th International Conference on Pattern Recognition. vol. 4, pp.709–717. IEEE (1996)15. Serra, J.: Image analysis and mathematical morphology. Academic Press, Inc. (1983)16. Shen, Y., Zhong, X., Shih, F.Y.: Deep morphological neural networks. arXiv preprintarXiv:1909.01532 (2019)17. Shih, F.Y., Shen, Y., Zhong, X.: Development of deep learning framework for math-ematical morphology. International Journal of Pattern Recognition and ArtificialIntelligence33