[PDF] Generalizing Complex/Hyper-complex Convolutions to Vector Map Convolutions

Abstract

We show that the core reasons that complex and hypercomplex valued neural networks offer improvements over their real-valued counterparts is the weight sharing mechanism and treating multidimensional data as a single entity. Their algebra linearly combines the dimensions, making each dimension related to the others. However, both are constrained to a set number of dimensions, two for complex and four for quaternions. Here we introduce novel vector map convolutions which capture both of these properties provided by complex/hypercomplex convolutions, while dropping the unnatural dimensionality constraints they impose. This is achieved by introducing a system that mimics the unique linear combination of input dimensions, such as the Hamilton product for quaternions. We perform three experiments to show that these novel vector map convolutions seem to capture all the benefits of complex and hyper-complex networks, such as their ability to capture internal latent relations, while avoiding the dimensionality restriction.

Full PDF

GGeneralizing Complex/Hyper-complexConvolutions to Vector Map Convolutions

Chase J [email protected] Anthony S [email protected] 10, 2020

Abstract

We show that the core reasons that complex and hypercomplex valuedneural networks oﬀer improvements over their real-valued counterparts isthe weight sharing mechanism and treating multidimensional data as a sin-gle entity. Their algebra linearly combines the dimensions, making eachdimension related to the others. However, both are constrained to a setnumber of dimensions, two for complex and four for quaternions. Here weintroduce novel vector map convolutions which capture both of these prop-erties provided by complex/hypercomplex convolutions, while dropping theunnatural dimensionality constraints they impose. This is achieved by in-troducing a system that mimics the unique linear combination of inputdimensions, such as the Hamilton product for quaternions. We performthree experiments to show that these novel vector map convolutions seemto capture all the beneﬁts of complex and hyper-complex networks, such astheir ability to capture internal latent relations, while avoiding the dimen-sionality restriction.

While the large majority of work in the area of machine learning (ML) hasbeen done using real-valued models, recently there has been an increasein use of complex and hyper-complex models [24, 17]. These models havebeen shown to handle multidimensional data more eﬀectively and requirefewer parameters than their real-valued counterparts.1 a r X i v : . [ c s . N E ] S e p or tasks with two dimensional input vectors, complex-valued neural net-works (CVNNs) are a natural choice. For example in audio signal process-ing the magnitude and phase of the signal can be encoded as a complexnumber. Since CVNNs treat the magnitude and phase as a single entity,a single activation captures their relationship as opposed to real-valuednetworks. CVNNs have been shown to outperform or match real-valuednetworks, while sometimes at a lower parameter count [25, 2]. However,most real world data has more than two dimensions such as color channelsof images or anything in the realm of 3D space.The quaternion number system extends the complex numbers. Thesehyper-complex numbers are composed of one real and three imaginary com-ponents making them ideal for three or four dimensional data. Quaternionneural networks (QNNs) have enjoyed a surge in recent research and showpromising results [22, 3, 5, 14, 15, 18, 19, 16]. Quaternion networks havebeen shown to be eﬀective at capturing relations within multidimensionaldata of four or fewer dimensions. For example the red, green, and blue colorimage channels for image processing networks needs to capture the crosschannel relationships of these colors as they contain important informationto support good generalization [12, 9]. Real-valued networks treat the colorchannels as independent entities unlike quaternion networks. Parcollet etal. [16] showed that a real-valued, encoder-decoder fails to reconstruct un-seen color images due to it failing to capture local (color) and global (edgesand shapes) features independently, while the quaternion encoder-decodercan do so. Their conclusion is that the Hamilton product of the quaternionalgebra allows the quaternion network to encode the color relation sinceit treats the colors as a single entity. Another example is 3D spatial coor-dinates for robotic and human-pose estimation. Pavllo et al. [20] showedimprovement on short-term prediction on the Human3.6M dataset usinga network that encoded rotations as quaternions over Euler angles.The prevailing view is that the main reason that these complex networksoutperform real-valued networks is their underlying algebra which treatsthe multidimensional data as a single entity. This allows the complex net-works to capture the relationships between the dimensions without thetrade-oﬀ of learning global features. However, using complex or hyper-complex numbers limits the dimensions to either two for complex or fourwith quaternions. There are higher dimensional hyper-complex systemssuch as octonions at eight dimensions, but they are a non-associative al-gebra. 2his paper considers a novel hypothesis that may explain the eﬀectivenessof complex/hypercomplex networks. Their convolutional operations usea form of weight sharing not found in real-valued networks. It may bethat this weight sharing alone is suﬃcient to explain the learning advan-tages described above. If the weight sharing, rather than the algebra, isthe most important factor for the enhanced learning abilities, then it maybe possible to drop the dimensionality constraints imposed by the com-plex/hypercomplex algebras. Therefore, the present paper proposes: 1) tocreate a system that mimics the concepts of complex and hyper-complexnumbers for neural networks, which treats multidimensional input as a sin-gle entity and incorporates weight sharing, but is not constrained to certaindimensions ; 2) to increase their local learning capacity by introducing alearnable parameter inside the multidimensional dot product. Our ex-periments herein show that these novel vector map convolutions seem tocapture all the beneﬁts of complex and hyper-complex networks, while im-proving their ability to capture internal latent relations, and avoiding thedimensionality restriction. Nearly all data used in machine learning is multidimensional and, toachieve good performance models, must both capture the local relationswithin the input features [23, 13], as well as non-local features, for exam-ple edges or shapes composed by a group of pixels. Complex and hyper-complex models have been shown to be able to both capture these localrelations better than real-valued models, but also to do so at a reducedparameter count due to their weight sharing property. However, as statedearlier, these models are constrained to two or four dimensions. Below wedetail the work done showing how hyper-complex models capture theselocal features as well as the motivation to generalize them to any numberof dimensions.Consider the most common method for representing an image, which is byusing three 2D matrices where each matrix corresponds to a color channel.Traditional real-valued networks treat this input as a group of unidimen-sional elements that may be related to one another, but not only doesit need to try to learn that relation, it also needs to try to learn globalfeatures such as edges and shapes. By encoding the color channels into a The full code is available at https://github.com/gaudetcj/VectorMapConv in Q out Where:w = w r + w x + w y + w z Q in = r in + x in + y in + z in Q out = r out + x out + y out + z ou t r in x in y in z in r out x out y out z out w w w w r in x in y in z in r out x out y out z out w r w x w y w z r in x in y in z in r out x out y out z out w r w x w y w z r in x in y in z in r out x out y out z out w r w x w y w z r in x in y in z in r out x out y out z out w r w x w y w z Real valued layer Quaternion valued layer

Internal Hamtilton product to produce Q out w Figure 1: Illustration of the diﬀerence between a real-valued layer (left)and quaternion-valued layer (right). The quaternion’s Hamilton productshows the internal relation learning ability not present in the real-valued.Weight sharing occurs because the same set of weights is used in fouroutputs.quaternion, each pixel is treated as a whole entity whose color componentsare strongly related. It has been shown that the quaternion algebra is re-sponsible for allowing QNNs to capture these local relations. For example,Parcollet et al. [16] showed that a real-valued, encoder-decoder fails toreconstruct unseen color images due to it failing to capture local (color)and global (edges and shapes) features independently, while the quaternionencoder-decoder can do so. Their conclusion is that the Hamilton prod-uct of the quaternion algebra allows the quaternion network to encode thecolor relation since it treats the colors as a single entity. The Hamiltonproduct forces a diﬀerent linear combination of the internal elements tocreate each output element. This is seen in Fig. 1 from [18], which showshow a real-valued model looks when converted to a quaternion model. No-tice that the real-valued model treats local and global weights at the samelevel, while the quaternion model learns these local relations during theHamilton product. The weight sharing property can also be seen whereeach element of the weight is used four times, reducing the parametercount by a factor of four from the real-valued model.The advantages of hyper-complex networks on multidimensional data seems4lear, but what about niche cases where there are higher dimensions thanfour? Examples include applications where one needs to ingest extra chan-nels of information in addition to RGB for image processing, like satelliteimages which have several bands. To overcome this limitation we intro-duce vector map convolutions, which attempt to generalize the beneﬁtsof hyper-complex networks to any number of dimensions. We also add alearnable set of parameters that modify the linear combination of internalelements to allow the model to decide how important each dimension maybe in calculating others.

This section will include the work done to obtain a working vector mapnetwork. This includes the vector map convolution operation and theweight initialization used.

Vector map convolutions use a similar mechanism to that of complex [25]and quaternion [5] convolutions but in a more general way that does notbind it to a hyper-complex algebra. We will begin by observing the quater-nion valued layer from Fig. 1. Our goal is to capture the properties ofweight sharing and each output axis being composed of a linear combi-nation of all the input axes, but for an arbitrary number of dimensions D vm .For the derivation we will choose D vm = N . Let V nin = [ v , v , . . . , v n ]be an N dimensional input vector and W n = [ w , w , . . . , w n ] be an N dimensional weight vector. Note that for the complex and quaternion casethe output vector is a set of diﬀerent linear combinations where each inputvector is multiplied by each weight vector element a total of one time overthe set. To achieve a similar result we will deﬁne a permutation function: τ ( v i ) = (cid:40) v n i = 1 v i − i > . By applying τ to each element in V n a new vector is created that is a5ircular right shifted permutation of V n : τ ( V n ) = [ v n , v , v , . . . , v n − ] . Let the repeated composition of τ be denoted as τ n , then we can deﬁnethe resultant output vector V out as: V nout = (cid:2) W n · V nin , τ ( W n ) · V nin , . . . , τ n − ( W n ) · V nin , τ n − ( W n ) · V nin (cid:3) (1)where | · | is the dot product of the vectors. The above gives each elementof V nout a unique linear combination of the elements of V nin and W n since wenever need to compose τ above n − W n and any permutation only appear once).The previous discussion applies to densely connected layers. The same ideais easily mapped to convolutional layers where the elements of V nin and W n are matrices. To develop intuitions, the quaternion convolution operationis depicted in Fig. 2. The top of the ﬁgure shows four multichannel inputsand four multichannel kernels to be convolved. The resulting output isshown at the bottom of the ﬁgure. Each row in the middle of the ﬁgureshows the calculation of one output feature map, which is a ‘convolutional’linear combination of one feature map with the four kernels (and the ker-nel coeﬃcients are distinct for each row). When looking at the patternacross the rows, the weight sharing can be seen. Across the rows, anygiven kernel is convolved with four diﬀerent feature maps. The only thingconstraining the dimension to four is the coeﬃcient values at the bottomof the ﬁgure imposed by the quaternion algebra (for more detail, see [5]).We hypothesize that the only thing important about the coeﬃcient val-ues is how they constrain the linear combinations to be independent. Wealso propose that the circularly shifted permutations just described gener-ate admissible linear combinations. In this case, space permitting, Fig. 2could be a 5 5 image, where ﬁve ﬁlters are convolved with ﬁve featuremaps while the weight sharing properties are preserved. That, is there isno longer a dimensional constraint.We also deﬁne a learnable constant deﬁned as a matrix L ∈ R D vm × D vm : l i,j =  i = 11 i = j j = ( i + ( i − − else. D vm = 4 so we canthen compare to the quaternion convolution. Here we let the weight ﬁltermatrix W = [ A , B , C , D ] by an input vector h = [ w , x , y , z ]  R ( W ∗ h ) I ( W ∗ h ) J ( W ∗ h ) K ( W ∗ h )  = L (cid:12)  A B C DD A B CC D A BB C D A  ∗  wxyz  (2)where L =  − − − − − −  (3)The operator | (cid:12) | denotes element-wise multiply. The sixteen parame-ters within L are the initial values. They are otherwise unconstrainedscalars and intended to be learnable. Thus, the vector map convolutionis a generalization of complex, quaternion, or octonion convolution as thecase may be, but it also drops the constraints imposed by the associatedhyper-complex algebra.For comparison the result of convolving a quaternion ﬁlter matrix W = A + i B + j C + k D by a quaternion vector h = w + i x + j y + k z is anotherquaternion,  R ( W ∗ h ) I ( W ∗ h ) J ( W ∗ h ) K ( W ∗ h )  =  A − B − C − DB A − D CC D A − BD − C B A  ∗  wxyz  , (4)where A , B , C , and D are real-valued matrices and w , x , y , and z arereal-valued vectors. See Fig. 2 for a visualization of the above operation.More explanation is given in [5].The question arises whether the empirical improvements observed in theuse of complex and quaternion deep networks are best explained by the7igure 2: An illustration of quaternion convolution.8ull structure of the hyper/complex algebra, or whether the weight sharingunderlying the generalized convolution is responsible for the improvement. Proper initialization of the weights has been shown to be vital to conver-gence of deep networks. The weight initialization for vector map networksuses the same procedure seen in both deep complex networks [25] anddeep quaternion networks [5]. In both cases, the expected value of | W | isneeded to calculate the variance: E [ | W | ] = (cid:90) ∞−∞ x f ( x ) dx (5)where f ( x ) is a multidimensional independent normal distribution wherethe number of degrees of freedom is two for complex and four for hyper-complex. Solving Eq. 5 gives 2 σ for complex and 4 σ for quaternions.Indeed, when solving Eq. 5 for a multidimensional independent normaldistribution where the number of degrees of freedom is D vm , the solutionwill equal D vm σ . Therefore, in order to respect the Glorot and Bengio [6]criteria, the variance would be equal to : σ = (cid:115) D vm ( n in + n out ) (6)and in order to respect the He [8] criteria, the variance would be equal to: σ = (cid:114) D vm n in . (7)This is used alongside a vector of dimension D vm that is generated followinga uniform distribution in the interval [0 ,

1] and then normalized. Thelinear combination parameter L in Eq. 2 is simply generated by randomlyselecting from the set {− , } . We perform three sets of experiments designed to see baseline performance,compare against some known quaternion results, and to test extreme cases9f dimensionality in the data. This is done by simple classiﬁcation onCIFAR data using diﬀerent size ResNet models for real, quaternion, andvector map. The second experiment replicates the results of colorizingimages using a convolutional auto-encoder from [16], but using vector mapconvolution layers. Lastly, the DSTL Satellite segmentation challenge fromKaggle [1] is used to demonstrate the high parameter count reduction whenvector map layers are used for high dimensional data.

These experiments cover simple image classiﬁcation using CIFAR-10 andCIFAR-100 datasets [11]. The CIFAR datasets are 32 ×

32 color imagesof 10 and 100 classes respectively. Each image contains only one class andlabels are provided. Since the CIFAR images are RGB, we use D vm = 3for all the experiments.For the architecture we use diﬀerent Residual Networks taken directly fromthe original paper [7]. We ran direct comparisons between real-valued,quaternion, and vector map networks on three diﬀerent sizes: ReNet18,ResNet34, and ResNet50. The only change from the real-valued to vectormap networks is that the number of ﬁlters at each layer is changed suchthat the parameter count is roughly the same as the real-valued network. The results are shown in Table 1 as well as the validation loss and accuracyplots shown in Fig. 3. Also shown in Fig. 4 is the histogram of L for theResNet18 vector map convolution network. We note that they appear tobe normally distributed around the initial values of either -1 or 1. We alsoinclude some visualizations of feature vector maps from the ﬁrst convo-lution channel randomly selected on a few images in Fig. 5 of the vectormap model.The vector map network’s ﬁnal accuracy outperforms the other models inall cases except one and the accuracy rises faster than both the real andquaternion valued networks. This may be due to the ability to control the10 rchitecture Params CIFAR-10 CIFAR-100 ResNet18 Real 11,173,962 5.92 27.81ResNet18 Quaternion 8,569,242 5.92 28.77ResNet18 Vector Map 7,376,320 6.05 27.18ResNet34 Real 21,282,122 5.73 28.18ResNet34 Quaternion 16,315,610 5.73 27.24ResNet34 Vector Map 14,044,960 5.55 25.88ResNet50 Real 23,520,842 6.10 27.40ResNet50 Quaternion 18,080,282 6.10 27.32ResNet50 Vector Map 15,559,120 5.72 25.16Table 1: Percent error for classiﬁcation on CIFAR-10 and CIFAR-100.Params is the total number of parameters.relationships of each color channel in the convolution operation, while thequaternion is stuck to its set algebra, and the real is not combining thecolor channels in a similar fashion to either.Figure 3: Validation loss and accuracy plots for CIFAR-10 correspondingto the experimental runs that produced Table 1.

This experiment originally was to explore the power of quaternion networksover real-valued by investigating the impact the Hamilton product had on11igure 4: Histogram of L values after training of the ResNet18 vector mapconvolution network.reconstructing color images from gray-scale only training [16]. A convolu-tional encoder-decoder (CAE) was used to test color image reconstruction.We performed the exact same experiment using quaternions, but also twoexperiments using vector map layers with D vm = 3 and D vm = 4. Thisway we can test if we mimic the quaternion results with four dimensionsand if we are capturing the important components of treating the inputdimensions as a single entity with three dimensions. The identical architec-ture is used, two convolutional encoding layers followed by two transposedconvolutional decoding layers.A single image is ﬁrst chosen, then converted to gray-scale using the func-tion GS ( p x,y ), where p x,y is the color pixel at location x, y . The gray valueis concatenated three times for each pixel to create the input to the vectormap CNN. We used the exact same model architecture, but since the out-put feature maps is three times larger in the vector map model we reducetheir size to 10 and 20. The kernel size and strides are 3 and 2 for alllayers. The model is trained for 3000 epochs using the Adam optimizer[10] using a learning rate of 5 e − . The weights are initialized following theabove scheme and the hardtanh [4] activation function is used in both theconvolutional and transposed convolutional layers.12igure 5: Randomly selected feature vector maps from the ﬁrst convolutionlayer after training. Each row is a diﬀerent image, where the ﬁrst columnare the original input images. The results can be seen in Fig. 6 where one can see the vector map CAE wasable to correctly produce color images like the quaternion CAE. Similarto the quaternion CAE, the vector map CAE appears to learn to preservethe internal relationship between the pixels similar to the Hamilton. Thereconstructed images were also evaluated numerically using the peak signalto noise ratio (PSNR) [26] and the structural similarity (SSIM) [27]. Theseevaluations appear in Table 2.

Image D vm = 3 PSNR, SSIM D vm = 4 PSNR, SSIM Quat PSNR, SSIM kodim23 28.94dB, 0.97 29.14dB, 0.96 31.68dB, 0.96kodim04 26.95dB, 0.96 26.99dB, 0.96 28.06dB, 0.93

Table 2: PSNR and SSIM results for vector map CAE compared to quater-nion CAE.The main goal of this experiment was to test if there exists a propertyof the quaternion structure that may have not been captured with the at-tempted generalization of vector map convolutions. Since both vector mapnetworks perform similarly to the quaternion network it appears that the13 rain QuaternionOriginal Vector Map

Figure 6: Grey-scale to color results on two KODAK images for a quater-nion CAE and a vector map CAE.way the vector map rules are constructed enable it to capture the essenceof the Hamilton product for any dimension size D vm and the additionalaspects of the algebraic structure are not important. Since the D vm = 3model matched the quaternion performance, we have shown that the sameperformance can be achieved with fewer parameters. The Dstl Satellite Imagery Feature Detection challenge was run on Kaggle[1] where the goal was to segment 10 classes from 1km x 1km satelliteimages in both 3-band and 16-band formats. Since satellite images havemany more bands of information than standard RGB, it makes it a gooduse case for vector map convolutions. We run experiments using the fullbands on both real-valued and vector map networks.

Both models use a standard U-Net base as described in the original paper[21]. We use the entire 16-band format as input, simply concatenatingthem into one input to the models. For the vector map network we choose14 vm = 16 to treat the entire 16-band input as a single entity. The real-valued model has a starting ﬁlter count of 32, while the vector map has astarting ﬁlter count of 96. The images are very large so we sample fromthem in sizes of 82 x 82 pixels, but only use the center 64 x 64 pixels forthe prediction. For training, the batch size is set to 32, the learning rateis 1e-3, and we decay the learning rate by a factor of 10 every 25 epochsduring a total of 75 epochs.Some of the classes of the data set are only in a couple of images. Dueto this reason, we train on all available images, but hold out random 400x 400 chunks of the original images. We use the same seed for both thereal-valued and vector map runs The results are shown in Table. 3 where one can see that for a lowerparameter budget, the vector map achieved better segmentation perfor-mance. Some of the features, like the vegetation and water, stand outmore distinctly in the non-RGB bands of information and the vector mapseems to have captured this more accurately. The main goal was to showthat the vector map convolutions could handle a large number of inputdimensions and potentially better capture how the channels relate to oneanother.

Architecture Params Jaccard Score

UNet Real 7,855,434 0.427UNet Vector Map 5,910,442 0.436Table 3: Jaccard score on DSTL Satellite segmentation challenge for real-valued and vector map UNet models. Params is the total number of pa-rameters.

This paper proposes vector map convolutions to generalize the beneﬁcialproperties of convolutional complex and hyper/complex networks to anynumber of dimensions. We also introduce a new learnable parameter tomodify the linear combination of internal features.15igure 7: Validation loss using Jaccard score.The ﬁrst set of experiments compares performance of vector map convo-lutions against real-valued networks in three diﬀerent sized ResNet mod-els on CIFAR datasets. They demonstrate that vector map convolutionnetworks have similar accuracy at a reduced parameter count eﬀectivelymimicking hyper-complex networks while consuming fewer resources. Wealso investigate the distribution of the ﬁnal values of L , the linear combi-nation terms, and see that they also tend to stay around the value theywere initialized to.We further investigated if vector map convolutions eﬀectively mimic quater-nion convolution in its ability to capture color features more eﬀectivelywith the Hamilton Product with image color reconstruction tests. Thevector map convolution model not only can reconstruct color like thequaternion CAE, but it performs better as indicated by PSNR and SSIMmeasures. This shows that other aspects of the quaternion algebra are notrelevant to this task and suggests that vector map convolutions could ef-fectively capture the internal relation of any dimension input for diﬀerentdata types. 16he ﬁnal experiment tested the ability of vector map convolutions to per-form well on very high dimensional input. We compared a real-valuedmodel against a vector map model in the Kaggle DSTL satellite segmen-tation challenge dataset, which has 12 channels of image information andcontains 10 classes. The vector map model was built with D vm = 12 andnot only had fewer learnable parameters than the real-valued model, itachieved a higher Jaccord score and learned at a faster rate. This estab-lishes advantage of vector map convolutions in higher dimensions.This set of experiments have shown that vector map convolutions appear tonot only capture all the beneﬁts of complex/hyper-complex convolutions,but can outperform them using a smaller parameter budget while alsobeing free from their dimensional constraints.17 eferences [1] Kaggle dstl satellite imagery feature detection. . Ac-cessed: 2020-05-30.[2] Igor Aizenberg and Alexander Gonzalez. Image recognition usingmlmvn and frequency domain features. In , pages 1–8. IEEE, 2018.[3] Eduardo Bayro-Corrochano, Luis Lechuga-Guti´errez, and MarcelaGarza-Burgos. Geometric techniques for robotics and hmi: Interpo-lation and haptics in conformal geometric algebra and control usingquaternion spike neural networks. Robotics and Autonomous Systems ,104:72–84, 2018.[4] Ronan Collobert. Large scale machine learning. Technical report,IDIAP, 2004.[5] Chase J. Gaudet and Anthony S. Maida. Deep quatertion networks.In ,2018.[6] Xavier Glorot and Yoshua Bengio. Understanding the diﬃculty oftraining deep feedforward neural networks. In

Proceedings of the Thir-teenth International Conference on Artiﬁcial Intelligence and Statis-tics , pages 249–256, 2010.[7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.Deep residual learning for image recognition. arXiv preprintarXiv:1512.03385 , 2015.[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delvingdeep into rectiﬁers: Surpassing human-level performance on imagenetclassiﬁcation. In

Proceedings of the IEEE international conference oncomputer vision , pages 1026–1034, 2015.[9] Teijiro Isokawa, Tomoaki Kusakabe, Nobuyuki Matsui, and FerdinandPeper. Quaternion neural network and its application. In

Interna-tional conference on knowledge-based and intelligent information andengineering systems , pages 318–324. Springer, 2003.[10] Diederik P Kingma and Jimmy Ba. Adam: A method for stochasticoptimization. arXiv preprint arXiv:1412.6980 , 2014.1811] Alex Krizhevsky and Geoﬀrey Hinton. Learning multiple layers offeatures from tiny images. 2009.[12] Hiromi Kusamichi, Teijiro Isokawa, Nobuyuki Matsui, Yuzo Ogawa,and Kazuaki Maeda. A new scheme for color night vision by quater-nion neural network. In

Proceedings of the 2nd International Con-ference on Autonomous Robots and Agents , volume 1315. Citeseer,2004.[13] Nobuyuki Matsui, Teijiro Isokawa, Hiromi Kusamichi, FerdinandPeper, and Haruhiko Nishimura. Quaternion neural network withgeometrical operators.

Journal of Intelligent & Fuzzy Systems , 15(3,4):149–164, 2004.[14] Titouan Parcollet, Mohamed Morchid, and Georges Linares. Deepquaternion neural networks for spoken language understanding. In , pages 504–511. IEEE, 2017.[15] Titouan Parcollet, Mohamed Morchid, and Georges Linares. Quater-nion denoising encoder-decoder for theme identiﬁcation of telephoneconversations. 2017.[16] Titouan Parcollet, Mohamed Morchid, and Georges Linar`es. Quater-nion convolutional neural networks for heterogeneous image process-ing. In

ICASSP 2019-2019 IEEE International Conference on Acous-tics, Speech and Signal Processing (ICASSP) , pages 8514–8518. IEEE,2019.[17] Titouan Parcollet, Mohamed Morchid, and Georges Linar`es. A sur-vey of quaternion neural networks.

Artiﬁcial Intelligence Review ,53(4):2957–2982, 2020.[18] Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, GeorgesLinar`es, Chiheb Trabelsi, Renato De Mori, and Yoshua Ben-gio. Quaternion recurrent neural networks. arXiv preprintarXiv:1806.04418 , 2018.[19] Titouan Parcollet, Ying Zhang, Mohamed Morchid, Chiheb Trabelsi,Georges Linar`es, Renato De Mori, and Yoshua Bengio. Quaternionconvolutional neural networks for end-to-end automatic speech recog-nition. arXiv preprint arXiv:1806.07789 , 2018.1920] Dario Pavllo, David Grangier, and Michael Auli. Quaternet: Aquaternion-based recurrent model for human motion. arXiv preprintarXiv:1805.06485 , 2018.[21] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convo-lutional networks for biomedical image segmentation. In

InternationalConference on Medical image computing and computer-assisted inter-vention , pages 234–241. Springer, 2015.[22] Kazuhiko Takahashi, Ayana Isaka, Tomoki Fudaba, and MasafumiHashimoto. Remarks on quaternion neural network-based controllertrained by feedback error learning. In , pages 875–880. IEEE, 2017.[23] Keiichi Tokuda, Heiga Zen, and Tadashi Kitamura. Trajectory mod-eling based on hmms with the explicit relationship between static anddynamic features. In

Eighth European Conference on Speech Commu-nication and Technology , 2003.[24] C Trabelsi, O Bilaniuk, Y Zhang, D Serdyuk, S Subramanian, JF San-tos, S Mehri, N Rostamzadeh, Y Bengio, and C Pal. Deep complexnetworks. arxiv 2018. arXiv preprint arXiv:1705.09792 .[25] Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk,Sandeep Subramanian, Jo˜ao Felipe Santos, Soroush Mehri, NegarRostamzadeh, Yoshua Bengio, and Christopher Pal. Deep complexnetworks. In

International Conference on Learning Representations2018 (Conference Track) , 2018. arxiv:1705.09792.[26] Deepak S Turaga, Yingwei Chen, and Jorge Caviedes. No referencepsnr estimation for compressed pictures.

Signal Processing: ImageCommunication , 19(2):173–184, 2004.[27] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli.Image quality assessment: from error visibility to structural similarity.