[PDF] A Quaternion-Valued Variational Autoencoder

Abstract

Deep probabilistic generative models have achieved incredible success in many fields of application. Among such models, variational autoencoders (VAEs) have proved their ability in modeling a generative process by learning a latent representation of the input. In this paper, we propose a novel VAE defined in the quaternion domain, which exploits the properties of quaternion algebra to improve performance while significantly reducing the number of parameters required by the network. The success of the proposed quaternion VAE with respect to traditional VAEs relies on the ability to leverage the internal relations between quaternion-valued input features and on the properties of second-order statistics which allow to define the latent variables in the augmented quaternion domain. In order to show the advantages due to such properties, we define a plain convolutional VAE in the quaternion domain and we evaluate its performance with respect to its real-valued counterpart on the CelebA face dataset.

Full PDF

aa r X i v : . [ c s . L G ] O c t QUATERNION-VALUED VARIATIONAL AUTOENCODER

Eleonora Grassucci, Danilo Comminiello, and Aurelio Uncini

Dept. Information Engineering, Electronics and Telecommunications (DIET)Sapienza University of Rome, Via Eudossiana 18, 00184 Rome, Italy

ABSTRACT

Deep probabilistic generative models have achieved incredi-ble success in many ﬁelds of application. Among such mod-els, variational autoencoders (VAEs) have proved their abilityin modeling a generative process by learning a latent repre-sentation of the input. In this paper, we propose a novel VAEdeﬁned in the quaternion domain, which exploits the proper-ties of quaternion algebra to improve performance while sig-niﬁcantly reducing the number of parameters required by thenetwork. The success of the proposed quaternion VAE withrespect to traditional VAEs relies on the ability to leverage theinternal relations between quaternion-valued input featuresand on the properties of second-order statistics which allowto deﬁne the latent variables in the augmented quaternion do-main. In order to show the advantages due to such properties,we deﬁne a plain convolutional VAE in the quaternion do-main and we evaluate it in comparison with its real-valuedcounterpart on the CelebA face dataset.

Index Terms — Variational Autoencoder, QuaternionNeural Networks, Quaternion Properness, Generative Mod-els, Quaternion Random Vectors

1. INTRODUCTION

Variational autoencoders (VAEs) gained their success due totheir ability of modeling a generative process formalized inthe framework of probabilistic graphical models with latentvariables [1, 2]. VAEs are characterized by a simple and fastsampling and easily accessible networks, which have prolif-erated their use in various applications. Recently, advancedVAEs have been developed, relying on statistical challenges[3, 4] or on hierarchical architectures [5, 6].Performing learning operations in the quaternion domain,rather than in the real-valued domain, has proved to providehigher efﬁciency when dealing with multidimensional data[7–9]. Quaternions are widely used in neural networks sincethey allow to process groups of features together, thus cod-ing latent inter-dependencies between features with a lower

Corresponding author’s email: [email protected]. Thiswork has been supported by “Progetti di Ricerca Grandi” of Sapienza Uni-versity of Rome under grant number RG11916B88E1942F. number of parameters with respect to real-valued neural net-works [10–12]. These advantages are due to the propertiesof quaternion algebras, including the Hamilton product thatis used in quaternion convolutions. This has recently pavedthe way to the development of novel deep quaternion neuralnetworks [10, 12, 13], often tailored to speciﬁc applications,including theme identiﬁcation in telephone conversation [14],3D sound event detection and localization [15, 16], hetero-geneous image processing [17] and speech recognition [18].Other properties of quaternion algebra that may be exploitedin learning processes are related to the second-order statistics.In particular, the concept of properness for quaternion valuedrandom signals and their applications has been widely inves-tigated in the signal processing literature [19–23].In this paper, we extend the concept of properness tothe latent space in order to derive a quaternion-valued VAE(QVAE). Designing VAEs in the quaternion domain maybring several advantages. Neural networks for VAEs shouldmodel long-range correlations in data [4, 24] and quaternionconvolutional layers may provide additional information byleveraging internal latent relations between input features.Moreover, reducing the number of parameters may bene-ﬁt the decoder and the marginal log-likelihood, which onlydepends on the generative network [4].The contribution of this paper is twofold: i) we deﬁne howto perform variational inference in the quaternion domain andii) propose a plain convolutional QVAE. Moreover, to the bestof our knowledge, for the ﬁrst time augmented quaternionsecond-order statistics are exploited for the development ofa deep learning model.The paper is organized as follows. In Section 2, we reviewthe main learning operations in the quaternion domain. Theproposed QVAE is derived in Section 3 and then evaluated inSection 4. Finally, conclusions are drawn in Section 5.

2. LEARNING IN THE QUATERNION DOMAIN2.1. Main Properties of Quaternion Algebra

The quaternion domain H is deﬁned by a four-dimensional as-sociative normed division algebra over real numbers, belong-ing to the class of Clifford algebras. A quaternion is expressedby one real scalar component and three imaginary ones: = q a + q b ˆ ı + q c ˆ  + q d ˆ κ = q a + q. (1)with q a , q b , q c , q d ∈ R . The imaginary units, ˆ ı = (1 , , , ˆ  = (0 , , , ˆ κ = (0 , , , are unit axis vectors and representan orthonormal basis in R . The vector q = q b ˆ ı + q c ˆ  + q d ˆ κ represents the imaginary part of the quaternion and it is alsoknown as pure quaternion . The imaginary units comply withthe following fundamental properties: ˆ ı = ˆ  = ˆ κ = − , (2) ˆ ı ˆ  = ˆ ı × ˆ  = ˆ κ, ˆ  ˆ κ = ˆ  × ˆ κ = ˆ ı, ˆ κ ˆ ı = ˆ κ × ˆ ı = ˆ , (3)where “ × ” denotes the vector product in R . The above prop-erties allow to adopt quaternion algebra H to represent spatialrotations in R . Quaternion algebra H is endowed with theoperations of associative multiplication of elements in the al-gebra and scalar multiplication. The scalar product of twoquaternions p and q is deﬁned as q · p = q a p a + q b p b + q c p c + q d p d . However, quaternions are not commutative under theoperation of vector multiplication, thus ˆ ı ˆ  = ˆ  ˆ ı , and conse-quently ˆ ı ˆ  = − ˆ  ˆ ı , ˆ  ˆ κ = − ˆ κ ˆ  , ˆ κ ˆ ı = − ˆ ı ˆ κ .Due to the noncommutative property, the quaternion prod-uct, also known as Hamilton product, can be expressed as: qp = ( q a + q b ˆ ı + q c ˆ  + q d ˆ κ ) ( p a + p b ˆ ı + p c ˆ  + p d ˆ κ )= ( q a p a − q b p b − q c p c − q d p d )+ ( q a p b + q b p a + q c p d − q d p c ) ˆ ı + ( q a p c − q b p d + q c p a + q d p b ) ˆ  + ( q a p d + q b p c − q c p b + q d p a ) ˆ κ. (4)We can deﬁne the conjugate and the module of a quater-nion, respectively, as q ∗ = q a − q b ˆ ı − q c ˆ  − q d ˆ κ and | q | = p q a + q b + q c + q d = | q ∗ | . A quaternion can be also writ-ten in polar form as q = | q | (cos ( θ ) + ν sin ( θ )) = | q | e νθ ,where θ ∈ R is the argument of the quaternion, cos ( θ ) = q a / k q k , sin ( θ ) = k q k / k q k and ν = q/ k q k is a pure unitquaternion. The involution of a quaternion q over a pure unitquaternion ν is q ν = − νqν , and it represents a rotation of π in the imaginary plane orthogonal to { , ν } .We will denote by boldface letters quaternions whosecomponents are vectors (lowercase letters) or matrices (capi-tal letters) of the same dimensions. Very often, it is necessary to study the characteristics ofa quaternion signal by analyzing the second-order statis-tics. However, second-order information within a quater-nion vector q cannot be estimated only from its corre-lation matrix C qq = E (cid:8) qq H (cid:9) , but we need to deﬁnethe complementary covariance matrices that augment theinformation within the covariance: C qq ˆ ı = E n qq ˆ ı H o , Table 1 . Properties of Q -proper random variables. E (cid:8) q δ (cid:9) = E (cid:8) q ǫ (cid:9) = σ ∀ δ, ǫ = { a, b, c, d } E { q δ q ǫ } = ∀ δ, ǫ = { a, b, c, d } , δ = ǫ E { qq } = − σ ∀ δ = { a, b, c, d } E n | q | o = 4 σ ∀ δ = { a, b, c, d } C qq ˆ  = E n qq ˆ  H o , C qq ˆ κ = E n qq ˆ κ H o (see also [20],among others, for further details). Thus, we can introducethe augmented covariance matrix of an augmented quaternionvector ˜ q = h q T q ˆ ı T q ˆ  T q ˆ κ T i T , as: ˜ C qq = E (cid:8) ˜ q ˜ q H (cid:9) =  C qq C qq ˆ ı C qq ˆ  C qq ˆ κ C H qq ˆ ı C q ˆ ı q ˆ ı C q ˆ ı q ˆ  C q ˆ ı q ˆ κ C H qq ˆ  C q ˆ  q ˆ ı C q ˆ  q ˆ  C q ˆ  q ˆ κ C H qq ˆ κ C q ˆ κ q ˆ ı C q ˆ κ q ˆ  C q ˆ κ q ˆ κ  (5)It is now possible to introduce the quaternion-valuedsecond-order circularity, or Q -properness. Deﬁnition 1 ( Q -Properness) A quaternion random vector q is Q -proper iff the three complementary covariance matrices C qq ˆ ı , C qq ˆ  and C qq ˆ κ vanish: E n qq ˆ ı H o = E n qq ˆ  H o = E n qq ˆ κ H o = . (6)The properness implies that q is not correlated with its vectorinvolutions q ˆ ı , q ˆ  , q ˆ κ . The main properties of a Q -proper ran-dom variable are collected in Table 1 [20], where σ denotesthe variance of q .A quaternion-valued random variable is Gaussian if allits components are jointly normal (see [20], among others),i.e., p (˜ q ) = p (cid:0) q , q ˆ ı , q ˆ  , q ik (cid:1) , thus the Gaussian probabilitydensity function for an augmented multivariate quaternion-valued random vector ˜ q is expressed as: p (˜ q ) = exp n − (˜ q − ˜ µ ) H ˜ C − qq (˜ q − ˜ µ ) o ( π/ N det (cid:16) ˜ C qq (cid:17) / , (7)where ˜ µ is the augmented quaternion-valued mean vector for ˜ q . For a Q -proper random vector, using the properties of Ta-ble 1 and replacing (6) in (5), the augmented covariance ma-trix ˜ C qq becomes equal to σ I (see also [20]), and the meanvalue refers directly to the quaternion vector, i.e., µ , thus weobtain a simpliﬁed expression of the Gaussian distribution: p (˜ q ) = 1(2 πσ ) N exp (cid:26) − σ ( q − µ ) H ( q − µ ) (cid:27) (8)here the argument of the exponential is a real function ofonly | q − µ | = ( q − µ ) H ( q − µ ) .

3. VARIATIONAL AUTOENCODER IN THEQUATERNION DOMAIN

Here, we introduce the novel quaternion variational autoen-coder (QVAE) as a generative method based on the proba-bilistic relation between the quaternion-valued input spaceand the quaternion-valued latent space. The QVAE aimsat controlling the distribution of the quaternion-valued la-tent vector, which has the characteristic of being a Q -properquaternion-valued random vector. Let us consider a quaternion-valued latent vector z ∈ H ,characterized by a prior probability distribution p θ ( z ) , anda quaternion input x ∈ H , whose conditional probability den-sity function with respect to z is expressed as p θ ( x | z ) . Simi-larly to the real-valued VAE [1], the QVAE introduces an ap-proximation ˆ p φ ( z | x ) of the true posterior distribution, thus itis possible to express the marginal likelihood as: log ( p θ ( x )) = λ D KL (ˆ p φ ( z | x ) || p θ ( z | x )) + L ( θ , φ ; x )= − λ D KL (ˆ p φ ( z | x ) || p θ ( z )) + E ˆ p { log ( p θ ( x | z )) } (9)where D KL ( · ) denotes the Kullback-Leibler (KL) divergenceand L ( θ , φ ; x ) is the variational lower bound with respect to θ and φ [1, 25]. The parameter λ is used to better scale KLvalues with respect to L ( θ , φ ; x ) .We assume that both the recognition model ˆ p φ ( z | x ) andthe generative model p θ ( z ) are based on Q -proper Gaussiandistributions, i.e., they follow the rule in (8). In particular, thedistribution of ˆ p φ ( z | x ) can be characterized by a quaternion-valued mean vector µ z and by an augmented covariance ma-trix ˜ C zz . This matrix is a block-diagonal matrix, whose sub-matrices on the main diagonal, extending the properties ofthe Q -proper random variables in Table 1, are equal and showthe same variance, thus ˜ C zz = diag { Σ , Σ , Σ , Σ } , where Σ = diag (cid:8) σ z (cid:9) and σ z is the quaternion-valued vector con-taining the variance values for each quaternion component.The quaternion-valued mean and variance vectors, µ z and σ z respectively, are computed by using each a quaternion-valuedsingle-layer neural network.On the other hand, the prior p θ ( z ) is assumed to bea centered isotropic Q -proper Gaussian distribution, i.e., p θ ( z ) ∽ N ( z ; , I ) . These considerations allow to approxi-mate the expectation in (9) as E ˆ p { log ( p θ ( x | z )) } ≅ L P Ll =1 log ( p θ ( x | z l )) , where z l , with l = 1 , . . . , L , denotes the ﬁrst L samples drawn from ˆ p φ ( z | x ) .Once created the Q -proper Gaussian prior distribution, weperform a reparametrization trick similarly to the real-valuedVAE [1], thus having z = µ z + σ z ⊙ ǫ , where ⊙ denotes anelement-wise product and ǫ ∈ H ∽ N ( , I ) . Considering that both ˆ p φ ( z | x ) and p θ ( z ) are Q -properdistributions and that ˜ C ǫǫ = 4 I , µ ǫ = being p θ ( z ) ∽ N ( z ; , I ) , the quaternion-valued KL loss in (9) can be ex-pressed as [19, 26]: D KL (ˆ p φ ( z | x ) || p θ ( z )) = 12 (cid:16) Tr n ˜ C − ǫǫ ˜ C zz ˜ C − ǫǫ o + ( µ ǫ − µ z ) H ˜ C − ǫǫ ( µ ǫ − µ z ) − N + log  det (cid:16) ˜ C ǫǫ (cid:17) det (cid:16) ˜ C zz (cid:17)  = 12 (cid:0) Tr { Σ z } + µ H z µ z − N (cid:1) − N X i =1 log (cid:0) σ i (cid:1) . (10)It is worth noting that the minimization of the KL divergence(10) provides a measure of Q -improperness [19]. For the scope of the paper, here we use a rather simple archi-tecture in order to prove the beneﬁts of the variational infer-ence in the quaternion domain. Thus, we consider an encodernetwork composed of quaternion convolutional layers.The quaternion convolution is one of the main operationsof deep neural networks in the quaternion domain [10, 12].Considering a generic quaternion input vector, q , deﬁned sim-ilarly to (1), and a generic quaternion ﬁlter matrix deﬁned as W = W a + W b ˆ ı + W c ˆ  + W d ˆ κ , the quaternion convolutioncan be expressed as the following Hamilton product: W ⊗ q = ( W a q a − W b q b − W c q c − W d q d )+ ( W a q b + W b q a + W c q d − W d q c ) ˆ ı + ( W a q c − W b q d + W c q a + W d q b ) ˆ  + ( W a q d + W b q c − W c q b + W d q a ) ˆ κ (11)The Hamilton product of (11) allows quaternion neural net-works to capture internal latent relations within the features ofa quaternion. Each quaternion convolutional layer is followedby a split quaternion Leaky-ReLU activation function [12].Quaternion batch normalization is not taken into account as itmay be considered as a source of randomness that could causeinstability [5,6]. Then, as also said in the previous subsection,a quaternion-valued fully-connected output layer is added tothe encoder to achieve the quaternion-valued augmented co-variance matrix that is used to sample the quaternion latentvariable z and compute the KL divergence loss.Similarly to the encoder, for the decoder network weuse quaternion transposed convolutional layers, each onefollowed by a quaternion Leaky-ReLU activation function.

4. EXPERIMENTAL RESULTS

In this section, we evaluate the proposed method on theCelebFaces Attributes Dataset (CelebA) [27], a large-scale able 2 . Averaged results from objective metrics on recon-struction (SSIM, MSE) and generation (FID) tasks. Scorescan be read as follows: the higher the better for SSIM, thelower the better for MSE and FID.SSIM MSE FID face attribute dataset with , images that we crop andscale to × pixels, as in [28]. We investigate the behaviorof the proposed QVAE in comparison with a plain VAE inboth reconstruction and generation tasks. We report gener-ated samples from both the models for a visual evaluation,as well as results in terms of structural similarity index mea-sure (SSIM), mean-square error (MSE) and Fr´echet InceptionDistance (FID), for a more objective appraisal.We consider an encoder with convolutional blocks forthe VAE and quaternion-convolutional layer for the proposedQVAE, both with dimensions (32 , , , , , anda transposed convolutional decoder (512 , , , , (equivalently, quaternion transposed convolutions in the pro-posed method). Both the networks consider Adam optimizerwith an initial learning rate of . decreased by a factor of . , similarly as [28]. The dimensionality of the latent spaceis set to in the quaternion domain . We consider a losscomposed of binary cross entropy (BCE) for reconstructionand a KL divergence weighted by λ = 0 . .Due to the quaternion operations, the QVAE network hasa signiﬁcantly lower number of parameters with respect tothe real-valued VAE (about less than half of the parameters,as shown in Table 2), thus gaining considerable memory ad-vantages. However, despite the lighter architecture in termsof parameters, the proposed QVAE clearly outperforms thereal-valued method in terms of objective metrics for the re-construction task. As shown in Table 2, the QVAE scores asigniﬁcantly better value both for SSIM and for MSE.Figure 4 reports the reconstructed images compared withthe original ones. On one hand, VAE generates samples fo-cusing just on the face and leaving hair, neck and backgroundvery blurred, almost indistinguishable. On the other hand, theproposed QVAE is able to generate samples considering bothface and background. Indeed, even hair and background arereconstructed more similarly to the ground truth samples, thusproducing more realistic images overall.Concerning the generation, which is usually the mostchallenging task, we report both the generated sample setsand the results in terms of objective metric. Figure 4 showsthe generated images from the plain VAE and the ones sam-pled from the proposed QVAE network. While the ﬁrst set The implementation of the QVAE is available online at https://github.com/eleGAN23/QVAE . O r i g i n a l VA E Q VA E Fig. 1 . Original test set and reconstructed samples sets fromplain VAE and proposed QVAE. VA E Q VA E Fig. 2 . Generated fake image samples from the plain VAEand the proposed QVAE.seems to have a more various background, the generatedfaces are less detailed and sometimes confused with the en-vironment. On the contrary, the set sampled from the QVAEshows less heterogeneous background, but signiﬁcantly moreaccurate face contours and, overall, a stabler generation. Thesuperior generation ability of the proposed QVAE is under-lined also by the results in Table 2 in terms of the FID score.The FID computes the distance of the statistics of generatedsamples with real ones, thus lower FID values correspond tomore real generated samples.Both reconstruction and generation results prove the ef-fectiveness of deﬁning a latent space and operating in thequaternion domain, while having a signiﬁcant reduction ofthe overall number of network parameters.

5. CONCLUSION

In this paper, we proposed a novel approach for variationalautoencoders, characterized by the deﬁnition of the generativemodel in the quaternion domain. In particular, the proposedQVAE is able to learn latent representations in the quaterniondomain by leveraging the augmented second-order statisticsof the quaternion-valued input. Moreover, the QVAE involvesthe quaternion convolutional layers in both the encoder andthe decoder networks, which lead to a signiﬁcant reductionof the overall number of network parameters. We consid-ered a plain QVAE to clearly show the beneﬁts of the newquaternion-based approach. Results have shown the effec-tiveness of the proposed approach in both reconstruction andgeneration tasks with respect to the real-valued counterpart.Future works will extend the proposed QVAE approach tomore complex and advanced deep generative models. . REFERENCES [1] D. P. Kingma and M. Welling, “Auto-encoding variationalBayes,” arXiv Preprint: arXiv:1312.6114v10 , pp. 1–14, May2014.[2] D. J. Rezende, L. Metz, and S. Chintala, “Stochastic backprop-agation in approximate inference in deep generative models,”in

Int. Conf. on Machine Learning (ICML) , Beijing, China,June 2014, pp. 1278–1286.[3] A. Razavi, A. van den Oord, B. Poole, and O. Vinyals, “Pre-venting posterior collapse with δ -VAE,” in Int. Conf. on Learn-ing Representations (ICLR) , New Orleans, LA, May 2019, pp.1–24.[4] A. Vahdat, E. Andriyash, and W. G. Macready, “Undirectedgraphical models as approximate posteriors,” in

Int. Conf. onMachine Learning (ICML) , Vienna, Austria, July 2020, pp.2266–2275.[5] L. Maaløe, M. Fraccaro, V. Li´evin, and O. Winder, “BIVA:A very deep hierarchy of latent variables for generative mod-eling,” in

Advances in Neural Information Process. Systems(NIPS) , Vancouver, Canada, Dec. 2019, pp. 6548–6558.[6] A. Vahdat and J. Kautz, “NVAE: A deep hierarchical varia-tional autoencoder,” arXiv Preprint: arXiv:2007.03898v1 , pp.1–20, July 2020.[7] T. B¨ulow and G. Sommer, “Hypercomplex signal – A novelextension of the analystic signal to the multidimensional case,”

IEEE Trans. Signal Process. , vol. 49, no. 11, pp. 2844–2852,Nov. 2001.[8] D. P. Mandic, C. Jahanchahi, and C. Cheong Took, “A quater-nion gradient operator and its applications,”

IEEE Signal Pro-cess. Lett. , vol. 18, no. 1, pp. 47–50, Jan. 2011.[9] D. Comminiello, M. Scarpiniti, R. Parisi, and A. Uncini,“Frequency-domain adaptive ﬁltering: From real to hypercom-plex signal processing,” in

IEEE Int. Conf. on Acoust., Speechand Signal Process. (ICASSP) , Brighton, UK, May 2019, pp.7745–7749.[10] C. Gaudet and A. Maida, “Deep quaternion networks,” in

IEEE Int. Joint Conf. on Neural Netw. (IJCNN) , Rio de Janeiro,Brazil, July 2018.[11] T. Parcollet, M. Ravanelli, M. Morchid, G. Linar`es, C. Tra-belsi, R. De Mori, and Y. Bengio, “Quaternion recurrent neu-ral networks,” in

Int. Conf. Learning Representations (ICLR) ,New Orleans, LA, May 2019, pp. 1–19.[12] T. Parcollet, M. Morchid, and G. Linar`es, “A survey of quater-nion neural networks,”

Artif. Intell. Rev. , Aug. 2019.[13] R. Vecchi, S. Scardapane, D. Comminiello, and A. Uncini,“Compressing deep-quaternion neural networks with targetedregularisation,”

CAAI Trans. Intell. Technol. , vol. 5, no. 3, pp.172–176, Sept. 2020.[14] T. Parcollet, M. Morchid, X. Bost, G. Linar`es, and R. De Mori,“Real to H-space autoencoders for theme identiﬁcation in tele-phone conversations,”

IEEE/ACM Trans. Audio, Speech, Lan-guage Process. , vol. 28, pp. 198–210, 2020. [15] D. Comminiello, M. Lella, S. Scardapane, and A. Uncini,“Quaternion convolutional neural networks for detection andlocalization of 3D sound events,” in

IEEE Int. Conf. on Acoust.,Speech and Signal Process. (ICASSP) , Brighton, UK, May2019, pp. 8533–8537.[16] M. Ricciardi Celsi, S. Scardapane, and D. Comminiello,“Quaternion neural networks for 3D sound source localiza-tion in reverberant environments,” in

IEEE Int. Workshop onMachine Learning for Signal Process. , Espoo, Finland, Sept.2020, pp. 1–6.[17] T. Parcollet, M. Morchid, and G. Linar`es, “Quaternion con-volutional neural networks for heterogeneous image process-ing,” in

IEEE Int. Conf. on Acoust., Speech and Signal Process.(ICASSP) , Brighton, UK, May 2019, pp. 8514–8518.[18] T. Parcollet, M. Morchid, G. Linar`es, and R. De Mori, “Bidi-rectional quaternion long short-term memory recurrent neu-ral networks for speech recognition,” in

IEEE Int. Conf. onAcoust., Speech and Signal Process. (ICASSP) , Brighton, UK,May 2019, pp. 8519–8523.[19] J. V`ıa, D. Ram`ırez, and I. Santamar`ıa, “Proper and widelylinear processing of quaternion random vectors,”

IEEE Trans.Inf. Theory , vol. 56, no. 7, pp. 3502–3515, July 2010.[20] C. Cheong Took and D. P. Mandic, “Augmented second-orderstatistics of quaternion random signals,”

Signal Process. , vol.91, no. 2, pp. 214–224, Feb. 2011.[21] J. V`ıa, L. Palomar, D. P. Vielva, and I. Santamar`ıa, “Quater-nion ICA from second-order statistics,”

IEEE Trans. SignalProcess. , vol. 59, no. 4, pp. 1586–1600, Apr. 2011.[22] F. Ortolani, D. Comminiello, and A. Uncini, “The widely lin-ear block quaternion least mean square algorithm for fast com-putation in 3D audio systems,” in , Vietri sul Mare,Italy, Sept. 2016.[23] N. Le Bihan, “The geometry of proper quaternion random vari-ables,”

Signal Process. , vol. 138, pp. 106–116, Sept. 2017.[24] H. Sadeghi, E. Andriyash, W. Vinci, L. Buffoni, and M. H.Amin, “PixelVAE++: Improved PixelVAE with discrete prior,” arXiv Preprint: arXiv:1908.09948v1 , pp. 1–12, Aug. 2019.[25] D. M. Blei, M. I. Jordan, and J. W. Paisley, “VariationalBayesian inference with stochastic search,” in

Int. Conf. onMachine Learning (ICML) , Edinburgh, UK, June 2012, pp.1367–1374.[26] C. Doersch, “Tutorial on variational autoencoders,” arXivPreprint: arXiv:1606.05908v2 , pp. 1–23, Aug. 2016.[27] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning faceattributes in the wild,” in

IEEE Int. Conf. on Computer Vision(ICCV) , Santiago, Chile, Dec. 2015, pp. 3730–3738.[28] X. Hou, L. Shen, K. Sun, and G. Qiu, “Deep feature consistentvariational autoencoder,” in