A general framework for constrained convex quaternion optimization
aa r X i v : . [ m a t h . O C ] F e b A general framework forconstrained convex quaternion optimization
Julien Flamant, Sebastian Miron, and David Brie
Abstract —This paper introduces a general framework forsolving constrained convex quaternion optimization problems inthe quaternion domain. To soundly derive these new results, theproposed approach leverages the recently developed generalized HR -calculus together with the equivalence between the origi-nal quaternion optimization problem and its augmented real-domain counterpart. This new framework simultaneously pro-vides rigorous theoretical foundations as well as elegant, compactquaternion-domain formulations for optimization problems inquaternion variables. Our contributions are threefold: (i) weintroduce the general form for convex constrained optimizationproblems in quaternion variables, (ii) we extend fundamentalnotions of convex optimization to the quaternion case, namelyLagrangian duality and optimality conditions, (iii) we developthe quaternion alternating direction method of multipliers (Q-ADMM) as a general purpose quaternion optimization algorithm.The relevance of the proposed methodology is demonstrated bysolving two typical examples of constrained convex quaternionoptimization problems arising in signal processing. Our resultsopen new avenues in the design, analysis and efficient implemen-tation of quaternion-domain optimization procedures. Index Terms —quaternion convex optimization; widely affineconstraints; optimality conditions; alternating direction methodof multipliers;
I. I
NTRODUCTION T HE USE of quaternion representations is becomingprevalent in many fields, including color imaging [1]–[3],robotics [4], attitude control and estimation [5], [6] polarizedsignal processing [7]–[9], rolling bearing fault diagnosis [10],computer graphics [11], among others. Compared to conven-tional real and complex models, quaternion algebra permitsunique insights into the physics and the geometry of theproblem at hand, while preserving a mathematically soundframework. In most applications, quaternions, besides provid-ing elegant and compact algebraic representations, enable areduction of the number of parameters. For instance, it hasbeen recently shown [12], [13] that quaternion convolutionalneural networks (QCNNs) achieve better performance thanconventional real CNNs while using fewer parameters.Most problems involving quaternion models can be cast asthe minimization of a real-valued function of quaternion vari-ables. Unfortunately, quaternion-domain optimization facesimmediatly a major obstacle: quaternion cost functions beingreal-valued, they are not differentiable according to quaternionanalysis [14] – just like real-valued functions of complexvariables are not differentiable according to complex analysis[15]. Thus, for a long time, the intrinsic quaternion nature
J. Flamant, S. Miron and D. Brie are with the Universit´e de Lor-raine, CNRS, CRAN, F-54000 Nancy, France. Corresponding author [email protected] . Authors acknowledge funding support ofCNRS and GDR ISIS through project OPENING. of quaternion optimization problems has been disregarded byreformulating them as optimization problems over the realfield. This procedure, however, has two major drawbacks: (i) it leads to the loss of the quaternion structure that naturallyencodes the physics of the problem at hand, and (ii) itgenerates cumbersome expressions in the real domain.Recently, a crucial step towards quaternion-domain opti-mization has been made with the development of the theoryof HR -calculus [16]–[19]. This new framework establishes acomplete set of differentiation rules, encouraging the system-atic development of quaternion-domain algorithms. The HR -calculus can be seen as the generalization to quaternions ofthe CR -calculus [20], which has enabled the formulation ofseveral important complex-domain algorithms [21], [22]. Thetheory of HR -calculus has led to the development of multiplequaternion-domain algorithms, notably in adaptive filtering[23]–[25], low-rank quaternion matrix and tensor completion[26]–[28] and quaternion neural networks [29], [30]. Theincreasing number of applications of quaternion algorithmscalls for a general methodology dedicated to quaternion op-timization, in order to design, analyze and implement newefficient quaternion-domain algorithms.To this aim, the present paper proposes a general frame-work for solving constrained convex quaternion optimizationproblems in the quaternion domain. We restrict ourselves tothe convex case as it allows to provide strong mathematicalguarantees. The proposed approach relies on two key ingre-dients: the generalized HR -calculus to compute derivatives ofcost functions defined in terms of quaternion variables, and thesystematic use of equivalences between the original quaternionoptimization problem and its augmented real-domain counter-part. This allows to provide solid theoretical foundations forour work; it also reveals specificities of quaternion-domainoptimization, such as the widely affine equality constraintsthat naturally arise in constrained quaternion problems. Thecontributions of this paper are threefold: (i) we introduce thegeneral form for convex constrained optimization problemsin quaternion variables (ii) we extend fundamental notionsof convex optimization to the quaternion case, namely La-grangian duality and optimality conditions; (iii) we developthe quaternion alternating direction method of multipliers (Q-ADMM) as a versatile quaternion optimization algorithm.The paper is organized as follows. In Section II we reviewthe necessary material regarding quaternion variables, their dif-ferent representations and discuss the general affine constraintin the quaternion domain. Section III describes generalized HR -calculus and its particular properties in the case of quater-nion cost functions. Section IV introduces the main theoreticaltools for quaternion convex optimization problems, including definitions and optimality conditions. Section V developsQ-ADMM in its general form to solve convex quaternionoptimization problems. Finally, we illustrate in Section VI therelevance of the proposed methodology by a detailed study oftwo examples of constrained quaternion optimization problemsinspired by the existing signal processing literature. SectionVII presents concluding remarks and perspectives.II. P RELIMINARIES
A. Quaternion algebra
The set of quaternions H defines a 4-dimensional normeddivision algebra over the real numbers R . It has canonical basis { , i , j , k } , where i , j , k are imaginary units such that i = j = k = ijk = − , ij = − ji , ij = k . (1)These relations imply in particular that quaternion multiplica-tion is noncommutative, meaning that for q, p ∈ H , one has qp = pq in general. Any quaternion q ∈ H can be written as q = q a + i q b + j q c + k q d , (2)where q a , q b , q c , q d ∈ R are the components of q . The realpart of q is Re q = q a whereas its imaginary part is Im q = i q b + j q c + k q d . A quaternion q is said to be purely imaginary(or simply, pure) if Re q = 0 . The quaternion conjugate of q is denoted by q ∗ = Re q − Im q and acts on the productof two quaternions as ( pq ) ∗ = q ∗ p ∗ . The modulus of q is | q | = √ qq ∗ = √ q ∗ q = p q a + q b + q c + q d . Any non-zeroquaternion q has an inverse q − = q ∗ / | q | . The inverse ofthe product of two quaternions is ( pq ) − = q − p − . Given anonzero quaternion µ ∈ H , the transformation q µ , µqµ − = 1 | µ | µqµ ∗ (3)describes a three-dimensional rotation of the quaternion q . Inparticular it satisfies the following properties q µ ∗ , ( q ∗ ) µ = ( q µ ) ∗ , ( pq ) µ = p µ q µ . (4)Pure unit quaternions such as i , j , k (and more generally,any µ such that µ = − ) play a special role and will bedenoted in bold italic letters. In this case the transformation(3) becomes an involution q µ = − µ q µ . In particular q i = − i q i = q a + i q b − j q c − k q d , (5) q j = − j q j = q a − i q b + j q c − k q d , (6) q k = − k q k = q a − i q b − j q c + k q d . (7)It follows that the components of q can be directly expressedas a function of q and its canonical involutions q i , q j , q k as q a = 14 (cid:0) q + q i + q j + q k (cid:1) ,q b = − i (cid:0) q + q i − q j − q k (cid:1) ,q c = − j (cid:0) q − q i + q j − q k (cid:1) ,q d = − k (cid:0) q − q i − q j + q k (cid:1) . (8)Any quaternion vector q ∈ H n can be written as q = q a + i q b + j q c + k q d , where q a , q b , q c , q d ∈ R n are its components. Similarly, any quaternion matrix A ∈ H m × n can be expressed as A = A a + i A b + j A c + k A d with A a , A b , A c , A d ∈ R m × n . The transpose of quaternion matrix A is denoted by A ⊤ . Its conjugate transpose (or Hermitian) is A H , ( A ∗ ) ⊤ = ( A ⊤ ) ∗ . Unless otherwise stated, quaternionrotations or involutions are always applied entry-wise. Notethat quaternion matrix product requires special attention dueto quaternion noncommutativity: that is for A ∈ H m × n , B ∈ H n × p , the ( i, j ) -th entry of AB reads ( AB ) ij = n X k =1 A ik B kj = n X k =1 B kj A ik . (9)This implies notably that ( AB ) ⊤ = B ⊤ A ⊤ and ( AB ) ∗ = A ∗ B ∗ in general whereas ( AB ) H = B H A H always holds.For more details on quaternions and quaternion linear algebrawe refer the reader to [31], [32] and references therein. B. Representation of quaternion vectors
Quaternion vectors can be represented in three equivalentways. Let q ∈ H n and let us introduce R n = (cid:8) ( q ⊤ a , q ⊤ b , q ⊤ c , q ⊤ d ) ⊤ ∈ R n | q ∈ H n (cid:9) , (10) H n = (cid:26)(cid:16) q ⊤ , q i ⊤ , q j ⊤ , q k ⊤ (cid:17) ⊤ ∈ H n | q ∈ H n (cid:27) . (11)By definition, there exist a one-to-one mapping between eachset H n , R n and H n . The set R n defines the augmented realrepresentation of q ∈ H n and can be identified with R n .We denote the augmented real vector by q R n . The set H n defines the augmented quaternion representation of q ∈ H n by making use of the three canonical involutions. Importantly, H n is only a subset of H n , i.e. H n ⊂ H n . We denote theaugmented quaternion vector by q H n .Expressions (8) show that q R n and q H n are linked by thelinear relationship q H n = J n q R n , J n , I n i I n j I n k I n I n i I n − j I n − k I n I n − i I n j I n − k I n I n − i I n − j I n k I n . (12)It can be shown that J n ∈ H n × n is invertible, with inverse J − n = J H n so that conversely q R n = 14 J H n q H n . (13)Now, equip each representation H n , R n and H n with thefollowing real-valued inner products: h q , p i H n , Re (cid:0) q H p (cid:1) , (14) h q R n , p R n i R n , q ⊤R n p R n , (15) h q H n , p H n i H n , q H H n p H n . (16)Proposition 1 below shows that inner products are preservedfrom one representation to another. Proposition 1.
Given q , p ∈ H n , the following equalitieshold: h q , p i H n = h q R n , p R n i R n = h q H n , p H n i H n . (17) Proof.
Let q = q a + i q b + j q c + k q d and p = p a + i p b + j p c + k p d be vectors of H n . Using rules of quaternion calculus, adirect calculation shows that Re (cid:0) q H p (cid:1) = q ⊤ a p a + q ⊤ b p b + q ⊤ c p c + q ⊤ d p d = q ⊤R n p R n . Then, by developing q H H n p H n and using (8), one gets q H H n p H n = q H p + q i H p i + q j H p j + q k H p k = q H p + (cid:0) q H p (cid:1) i + (cid:0) q H p (cid:1) j + (cid:0) q H p (cid:1) k = 4Re (cid:0) q H p (cid:1) which concludes the proof. This result can also be obtainedusing q H H n p H n = q H R n J H Jp R n = 4 q ⊤R n p R n .It directly follows that induced norms are equal, such that k q k H n = k q R n k R n = k q H n k H n . Moreover, the norminduced by the real-valued inner product (14) is identical tothe standard quaternion -norm, since k q k H n = Re (cid:0) q H q (cid:1) = k q k , n X i =1 | q i | . (18)For brevity, the -norm notation will thus be used hereafterindependently from the representation H n , R n or H n . Inaddition, the subscript indicating the dimension of R and H will be dropped whenever there is no risk of ambiguity. C. Affine and linear constraints
Affine equality constraints are ubiquitous in standard, real-domain constrained convex optimization. To perform opti-mization directly in the quaternion domain, a key step is toinvestigate how such constraints translate to quaternions. Let q ∈ H n be the variable of interest. Using its augmented realrepresentation, the most general affine constraint reads A R q R n = b R p , A R ∈ R p × n , b R p ∈ R p . ( A R )The constraint is said to be linear if b R p = 0 . Our goalis to find equivalent formulations of ( A R ) in the augmentedquaternion domain H and in H n , respectively. Using the linearmapping between H and R , one gets( A R ) ⇐⇒ J p A R J H n q H n = J p b R p , (19) ⇐⇒ A H q H n = b H p . ( A H )The constraint remains affine in the augmented quaterniondomain, yet the matrix A H ∈ H p × n exhibits a specific bandstructure (see Appendix A for details): A H = A A A A A i A i A i A i A j A j A j A j A k A k A k A k , (20)where A , A , A , A ∈ H p × n are quaternion matrices. Toobtain the corresponding constraint in the quaternion domain,remark that each line of the system of equations ( A H ) cor-respond to a block of p -equations. Writing ( A H ) explictly,we see that the second, third and fourth p -blocks are simplycanonical involutions of the first block of equations. As a result, this shows that ( A H ) is equivalent to the following widely affine quaternion constraint A q + A q i + A q j + A q k = b . ( A )The term widely refers to the fact that q and its three canonicalinvolutions q i , q j and q k are necessary to describe the mostgeneral affine constraint in the corresponding augmented realspace. In particular, strictly affine quaternion constraints like Aq = b – which are commonly found in the existingliterature – are only a special case of widely affine constraintsdescribed by ( A ). Focusing on this special case, stricly affinequaternion constraints Aq = b impose a peculiar structureon the associated real matrix A R . The matrix A H becomesdiagonal, and letting A = A a + A b i + A c j + A d k , with A a , A b , A c , A d ∈ R p × n one obtains A R = 14 J H p diag( A , A i , A j , A k ) J n , (21) = A a − A b − A c − A d A b A a − A d A c A c A d A a − A b A d − A c A b A a , (22)hence A R is a real structured matrix of size p × n .III. O PTIMIZATION OF REAL - VALUED FUNCTIONSOF QUATERNION VARIABLES
Given a real-valued function of quaternion variables (e.g. acost function), is it possible to define quaternion derivativesand if so, how can we compute them? One key obstacle lies inthe non-analytic nature of real-valued functions: this means, inparticular, that such functions are not quaternion differentiable[14], [33] and that other strategies need to be developed.First, a pseudo-derivative approach can be used by treatinga function f of the variable q ∈ H n as a function of its fourreal components q a , q b , q c , q c – however, as the compactnessof quaternion expressions is lost, such an approach mayrequire tedious and cumbersome computations. Alternatively,the recent advent of (generalized) HR -calculus [16], [18]paved the way to efficient computation of quaternion-domainderivatives. It provides a complete framework generalizing the CR -calculus [20] of complex-valued optimization to the caseof quaternion functions. Generalized HR -calculus is one ofthe key ingredients of the proposed framework for constrainedquaternion optimization. This section covers the fundamentaldefinitions and properties, focusing on practical aspects. Fordetailed proofs and discussions, we refer the reader to thepioneering papers [17]–[19]. A. Generalized HR -derivatives for cost functions We first consider the simpler case of a univariate function f : H → R . The function f is said to be real-differentiable[34] if the function f ( q ) = f ( q a , q b , q c , q d ) = f ( q R ) isdifferentiable with respect to the real variables q a , q b , q c , q d .Generalized HR (GHR)-derivatives are defined in terms ofstandard real-domain derivatives as follows. Definition 1 (Generalized HR -derivatives [18]) . Let µ bea nonzero quaternion. The GHR derivatives of a real-differentiable f : H → R with respect to q µ and q µ ∗ are ∂f∂q µ = 14 (cid:18) ∂f∂q a − ∂f∂q b i µ − ∂f∂q c j µ − ∂f∂q d k µ (cid:19) (23) ∂f∂q µ ∗ = 14 (cid:18) ∂f∂q a + ∂f∂q b i µ + ∂f∂q c j µ + ∂f∂q d k µ (cid:19) (24)The term generalized refers to the use of an arbitraryquaternion rotation encoded by µ = 0 in expressions (23)–(24). This is necessary to ensure that GHR calculus can beequipped with product and chain rules (see [18] for details andfurther properties). Since f is real-valued, its GHR derivativesenjoy several nice properties, such as the conjugate rule (cid:18) ∂f∂q µ (cid:19) ∗ = ∂f∂q µ ∗ , (cid:18) ∂f∂q µ ∗ (cid:19) ∗ = ∂f∂q µ , (25)together with a special instance of the rotation rule [18] ∂f∂q ν = (cid:18) ∂f∂q (cid:19) ν , ∂f∂q ν ∗ = (cid:18) ∂f∂q ∗ (cid:19) ν . (26)for ν ∈ { , i , j , k } . B. Quaternion gradient and stationary points
Consider now a real-valued function f : H n → R of thequaternion vector variable q = ( q , q , . . . q n ) ⊤ ∈ H n . We as-sume that f is real-differentiable, that is, is real-differentiablewith respect to each vector component q i , i = 1 , , . . . , n .The µ -gradient operator and µ -conjugated gradient operatorsare defined in terms of GHR derivatives as follows [19]: ∇ q µ f , (cid:18) ∂f∂q µ , ∂f∂q µ , . . . , ∂f∂q nµ (cid:19) ⊤ ∈ H n , (27) ∇ q µ ∗ f , (cid:18) ∂f∂q µ ∗ , ∂f∂q µ ∗ , . . . , ∂f∂q nµ ∗ (cid:19) ⊤ ∈ H n . (28)Remark immediatly that since f is real-valued, the conjugaterule (25) implies that ∇ q µ f = ( ∇ q µ ∗ f ) ∗ . When µ = 1 ,we simply call ∇ q f (resp. ∇ q ∗ f ) the quaternion gradientof f (resp. conjugated quaternion gradient of f ). Choosingthe canonical involutions µ ∈ { , i , j , k } , we define theaugmented quaternion gradient and conjugated augmentedquaternion gradient as ∇ H f , ∇ q f ∇ q i f ∇ q j f ∇ q k f , ∇ H ∗ f , ∇ q ∗ f ∇ q i ∗ f ∇ q j ∗ f ∇ q k ∗ f ∈ H n . (29)Introducing the (standard) augmented real gradient operator as ∇ R , ( ∇ ⊤ q a , ∇ ⊤ q b , ∇ ⊤ q c , ∇ ⊤ q d ) ⊤ and exploiting the definitionof generalized HR derivatives (23)–(24), one obtains ∇ H f = 14 I n − i I n − j I n − k I n I n − i I n j I n k I n I n i I n − j I n k I n I n i I n j I n − k I n ∇ q a f ∇ q b f ∇ q c f ∇ q d f = 14 J ∗ n ∇ R f , (30) and ∇ H ∗ f = 14 I n i I n j I n k I n I n i I n − j I n − k I n I n − i I n j I n − k I n I n − i I n − j I n k I n ∇ q a f ∇ q b f ∇ q c f ∇ q d f = 14 J n ∇ R f . (31)In particular, the real augmented and conjugated augmentedquaternion gradients are related by a simple linear transform: ∇ R f = J H n ∇ H ∗ f . (32)This result is fundamental for the proposed quaternion op-timization framework since it permits to switch from onerepresentation of quaternion vectors to another while preserv-ing gradient-related properties. Notably, it allows to derivenecessary and sufficient conditions for stationary points ofreal-valued functions of quaternion variables. Proposition 2 (Stationary points) . Let f : H n → R be real-differentiable and let µ ∈ { , i , j , k } . The vector q ⋆ ∈ H n isa stationary point of f iff ∇ q µ f ( q ⋆ ) = 0 ⇔ ∇ q µ ∗ f ( q ⋆ ) = 0 ⇔ ∇ R f ( q ⋆ R ) = 0 ⇔ ∇ H f ( q ⋆ H ) = 0 ⇔ ∇ H ∗ f ( q ⋆ H ) = 0 . (33) Proof.
Let q ⋆ ∈ H n and define q ⋆ R and q ⋆ H its augmentedreal and quaternion vectors. Suppose that ∇ R f ( q ⋆ R ) = 0 . ByEqs. (30)–(31) one has ∇ R f ( q ⋆ R ) = 0 ⇔ ∇ H f ( q ⋆ H ) = 0 ⇔ ∇ H ∗ f ( q ⋆ H ) = 0 . Clearly, by definition of ∇ H one has ∇ H f ( q ⋆ H ) = 0 ⇒∇ q f ( q ⋆ ) = 0 . Conversely, suppose that ∇ q f ( q ⋆ ) = 0 . Since f is real-valued, one has ∇ q µ f = ( ∇ q f ) µ for µ ∈ { , i , j , k } ,so that ∇ q f ( q ⋆ ) = 0 ⇒ ∇ H f ( q ⋆ H ) = 0 . Similarly oneshows that ∇ q ∗ f ( q ⋆ ) = 0 ⇔ ∇ H ∗ f ( q ⋆ H ) = 0 , whichconcludes the proof.Proposition 2 has a very important consequence: it statesthat optimization problems involving quaternion variables canbe equivalently tackled in any representation: H n , R n or H n .This equivalence allows to move back-and-forth between thethree representations and to benefit form the advantages ofeach. This result is a cornerstone for the proposed frameworkfor quaternion convex optimization detailed in the remainingof this paper. IV. C ONVEX OPTIMIZATIONWITH QUATERNION VARIABLES
This section starts by introducing the notion of convexsets and convex functions in the quaternion domain. Then weintroduce the most general form for a constrained quaternionconvex problem by leveraging the equivalent augmented realoptimization problem. The notion of Lagrangian and dualityare introduced next, which enables the formulation of twofundamental optimality conditions. Some of these definitionsmay appear trivial to the reader familiar with the convexoptimization field: yet, in our opinion, explicit and rigorousdefinitions are necessary to ensure the soundness of theproposed framework for quaternion convex optimization.
A. Convex sets and convex functions
Definitions of convexity for quaternion sets of H n or fora real-valued function of quaternions variables f : H n → R appear very close to the standard real case. This is essentiallydue to the fact that convexity is intrisically a “real” property,so that convexity in the quaternion domain is inherited fromconvexity of the equivalent, real-augmented representation. Convex sets.
Let
C ⊂ H n and define C R = { ( q ⊤ a , q ⊤ b , q ⊤ c , q ⊤ d ) ⊤ ∈ R n | q ∈ C} its augmented realrepresentation. We say that C is a convex subset (resp. cone,convex cone) of H n if C R is a convex subset (resp. cone,convex cone) of R n . This leads to the following explicitdefinitions. Definition 2 (Convex set) . A set
C ⊂ H n is convex if ∀ p , q ∈C and any θ ∈ [0 , , one has θ p + (1 − θ ) q ∈ C . A similar definition is possible for cones and convex conesof H n . Definition 3 (Cone and convex cone) . A set
C ⊂ H n is a coneif ∀ q ∈ C and θ ≥ , θ q ∈ C . A set C is a convex cone if itis convex and a cone, which means that ∀ p , q ∈ C and any θ , θ ≥ we have θ p + θ q ∈ C . Remark 1.
Given a convex set
C ⊂ H n (resp. convex cone),the set C H , (cid:8) ( q , q i , q j , q k ) ⊤ ∈ H n | q ∈ C (cid:9) is a convex subset (resp. convex cone) of H n . The converseis also true. Remaining definitions such as convex hull, dual cone, etc.for the quaternion domain are omitted for brevity. They can beobtained if desired, by proceeding analogously and exploitingequivalence with the augmented real representation.
Convex functions.
Similarly to the definition of convexsets of quaternions, the definition of convexity of real-valuedfunctions of quaternion variables relies on convexity of theassociated function in terms of augmented real-variables.
Definition 4 (Convex function) . A function f : H n → R isconvex if its domain dom f is convex and if for all p , q ∈ dom f and for θ ∈ [0 , one has f ( θ p + (1 − θ ) q ) ≤ θf ( p ) + (1 − θ ) f ( q ) (34) A function f is strictly convex if the above inequality is strictwhenever p = q and θ ∈ [0 , . Remark that if f ( q ) is convex, the function f ( q H ) is alsoconvex, and conversely.In practice, supposing that f is real-differentiable it ispossible to characterize convexity in terms of quaterniongradients introduced in Section III-B. Proposition 3 (First-order characterization) . Consider f : H n → R a real-differentiable function such that dom f isconvex. Then f is convex if and only if ∀ p , q ∈ dom ff ( p ) ≥ f ( q ) + 4Re (cid:0) ∇ q ∗ f ( q ) H ( p − q ) (cid:1) (35) ⇐⇒ f ( p H ) ≥ f ( q H ) + ∇ H ∗ f ( q H ) H ( p H − q H ) (36) ⇐⇒ f ( p R ) ≥ f ( q R ) + ∇ R f ( q R ) ⊤ ( p R − q R ) . (37) Proof.
Suppose dom f convex and f real-differentiable. Thenthe usual convexity condition [35, Chapter 3] on f reads interms of real augmented variables f is convex ⇔ ( ∀ p , q ∈ dom f,f ( p R ) ≥ f ( q R ) + ∇ R f ( q R ) ⊤ ( p R − q R ) . Using (32), the inner product term can be rewritten as ∇ R f ( q R ) ⊤ ( p R − q R ) = ∇ R f ( q R ) H ( p R − q R )= (cid:0) J H ∇ H ∗ f ( q H ) (cid:1) H J − ( p H − q H )= ∇ H ∗ f ( q H ) H ( p H − q H ) , which yields the second equivalency. To obtain the result in q , p coordinates, remark that ∇ H ∗ f ( q H ) is the quaternionaugmented vector of ∇ q ∗ f ( q ) , so that by Proposition 1 ∇ H ∗ f ( q H ) H ( p H − q H ) = 4Re (cid:0) ∇ q ∗ f ( q ) H ( p − q ) (cid:1) which concludes the proof. Example.
Consider the function f : H n → R defined by f ( q ) = k P q + P q i + P q j + P q k − b k , (38)where P , P , P , P ∈ H p × n and b ∈ H p are arbitrary.First, note that dom f = H n is convex. From Prop. 1, f ( q ) = f ( q R ) = k P R q R − p R k ; f is a convex function of theaugmented real variable q R , so f is convex in q .For brevity, properties of quaternion convex functions suchas closedness, properness, etc. are omitted. They can bedefined without difficulty just like above, by exploiting theaugmented real representation. B. Convex problems
The most general form of real constrained convex opti-mization problems consists in the minimization of a convexfunction subject to inequality constraints defined by convexfunctions and to affine equality constraints. To obtain itsquaternion-domain counterpart, consider the following generalconstrained convex problem using real augmented variables:minimize f ( q R ) subject to f i ( q R ) ≤ , i = 1 , . . . , m A R q R = b R , ( P R )where f , . . . f m : R n → R are real-valued convex functions,and the p affine equality constraints are encoded by arbitraryaugmented real matrices A R ∈ R p × n and by vector b R ∈ R p . Using results from Section II-C, the problem ( P R ) can be equivalently rewritten in terms of the augmented quaternionvariable q H ∈ H ⊂ H n as follows:minimize f ( q H ) subject to f i ( q H ) ≤ , i = 1 , . . . , m A H q H = b H , ( P H )where A H ∈ H p × n is the structured quaternion matrixgiven by (20). Going back to the original quaternion variabledomain, problems ( P R ) and ( P H ) are rewritten asminimize f ( q ) subject to f i ( q ) ≤ , i = 1 , . . . , m A q + A q i + A q j + A q k = b ( P )where A , A , A , A ∈ H p × n and b ∈ H p encode p -quaternion widely affine equality constraints. This particulartype of equality constraints – specific to quaternion algebra –ensures that ( P ) defines the most general form of constrainedconvex quaternion optimization problems. In the sequel, theequivalence between the three optimization problems ( P ),( P H ) and ( P R ) is thoroughly exploited to construct a generalconstrained convex optimization framework directly in thequaternion domain. C. Lagrangian and duality
The Lagrangian associated with the real equivalent problem( P R ) is the function L : R n × R m × R p defined by L ( q R , ν , λ R ) , f ( q R ) + m X i =1 ν i f i ( q R ) + λ ⊤R ( A R q R − b R ) , (39)where ν = ( ν , ν , . . . , ν m ) ⊤ ∈ R m is the dual variableassociated to the m inequality constraints and λ R ∈ R p is the dual variable corresponding to the p real-augmentedequality constraints A R q R = b R . To get the expression ofthe Lagrangian in terms of quaternion variables, we exploitthe linear relation between the real and quaternion augmentedquaternion representations R and H described in SectionsII-B together with the equivalence between affine constraintsdescribed in Section II-C. Using Proposition 1, we get from(39) the Lagrangian in augmented quaternion variables as L ( q H , ν , λ H )= f ( q H ) + m X i =1 ν i f i ( q H ) + 14 λ H H ( A H q H − b H ) . (40)Applying the same approach, one obtains the expression ofthe Lagrangian of the quaternion optimization problem ( P ) asthe function L : H n × R m × H p defined by L ( q , ν , λ ) , f ( q ) + m X i =1 ν i f i ( q )+ Re h λ H (cid:0) A q + A q i + A q j + A q k − b (cid:1)i . (41) Remark 2.
It can be easily checked by direct calculation that L ( q , ν , λ ) = L ( q H , ν , λ H ) = L ( q R , ν , λ R ) , meaning thatthey represent the same quantity. Remark 3.
The Lagrange dual variable ν ∈ R m associated tothe inequality constraints is always a real-vector variable nomatter which representation ( H n , R , H ) is chosen, since theinequality constraints are defined by the real-valued functions f i , i = 1 , . . . m . The main difference between (39) , (40) and (41) lies in the way the affine equality constraints arehandled, since p quaternion equality constraints are equivalentto p real equality constraints. This explains why, in theexpression of the quaternion Lagrangian (41) , the Lagrangedual variable λ associated to equality constraints is a p -dimensional quaternion vector. The definition of the quaternion Lagrangian (41) is a keystep. It allows to define quaternion-domain counterparts of fun-damentals tools from Lagrangian duality in a straightforwardway. For instance, let D denote the domain of the problem( P ) such that D = ∩ mi =0 dom f i ∩ A , where ( A ) denotes thedomain of the widely affine constraint. The quaternion dualfunction is defined as g ( ν , λ ) , inf q ∈D L ( q , ν , λ ) . (42)The dual quaternion Lagrange problem then readsmaximize g ( ν , λ ) subject to ν (cid:23) (43)Just like with standard real-domain optimization, the dualLagrange function yields lower bounds on the optimal valueof the original problem ( P ), meaning that weak duality holds.Conditions for strong duality are not given here. They may bederived as well, by simple adaptation of the real case, see e.g.[35, Section 5.2.3]. D. Optimality conditions
Exploiting the equivalence between the quaternion opti-mization problem ( P ) and the real, augmented optimizationproblem ( P R ) we derive two fundamental optimality condi-tions for ( P ). To simplify the presentation, assume that thefunctions f i , i = 0 , , . . . m are real-differentiable. Simple optimality condition.
Applying the usual optimalityconditions for the equivalent real convex optimization problem(see [35, Section 4.2.3] for details) allows to derive a simpleoptimality condition for real-differentiable f . Let F denotethe feasibility set F , (cid:26) q (cid:12)(cid:12)(cid:12)(cid:12) f i ( q ) ≤ , i = 1 , . . . m A q + A q i + A q j + A q k = b (cid:27) . (44)Then by using the first order characterization of the convexityof f given in Proposition 3, one obtains the followingnecessary and sufficient condition: the vector ˜ q is optimal forthe problem ( P ) if and only if ˜ q ∈ F and Re (cid:0) ∇ q ∗ f (˜ q ) H ( r − ˜ q ) (cid:1) ≥ for all r ∈ F . (45) Karush-Kuhn-Tucker (KKT) conditions.
Considering theconvex quaternion optimization problem ( P ), sufficient opti-mality conditions known as KKT conditions can be derivedfrom its real equivalent convex optimization problem. Proposition 4 (KKT conditions) . Consider the constrainedquaternion convex problem ( P ) with quaternion Lagrangian L ( q , ν , λ ) given in (41) . Let ˜ q ∈ H n , ˜ ν ∈ R m , ˜ λ ∈ H p beany points such that f i (˜ q ) ≤ i = 1 , . . . , m (46) A ˜ q + A ˜ q i + A ˜ q j + A ˜ q k − b = 0 (47) ˜ ν i ≥ i = 1 , . . . , m (48) ˜ ν i f i (˜ q ) = 0 i = 1 , . . . , m (49) ∇ q ∗ L (˜ q , ˜ ν , ˜ λ ) = 0 (50) then ˜ q and (˜ ν , ˜ λ ) are primal and dual optimal, with zero-duality gap. KKT conditions look almost the same as standard KKT con-ditions for real problems, except primal feasibility for equalityconstraints (47) and the Lagrangian stationarity condition (50).Proposition 5 below provides the explicit form of the stationarycondition for the Lagrangian in quaternion variables.
Proposition 5.
Let ˜ q ∈ H n , ˜ ν ∈ R m , ˜ λ ∈ H p such that theysatisfy KKT conditions (46) – (50) . The stationarity condition (50) is explicitly given by ∇ q ∗ f (˜ q ) + m X i =1 ˜ ν i ∇ q ∗ f i (˜ q )+ 14 (cid:20) A H ˜ λ + (cid:16) A H ˜ λ (cid:17) i + (cid:16) A H ˜ λ (cid:17) j + (cid:16) A H ˜ λ (cid:17) k (cid:21) = 0 . (51) Proof.
Using Proposition 2, the stationarity condition can beequivalently expressed as ∇ q ∗ L (˜ q , ˜ ν , ˜ λ ) = 0 ⇔ ∇ R L (˜ q R , ˜ ν , ˜ λ R ) = 0 ⇔ ∇ H ∗ L (˜ q H , ˜ ν , ˜ λ H ) = 0 , (52)where L (˜ q R , ˜ ν , ˜ λ R ) and L (˜ q H , ˜ ν , ˜ λ H ) are given by (39)and (40), respectively. A straightforward calculation givesexplicitly the stationarity condition in the R representation ∇ R L (˜ q R , ˜ ν , ˜ λ R ) = 0 ⇔ ∇ R f (˜ q R ) + m X i =1 ˜ ν i ∇ R f i (˜ q R ) + A ⊤R ˜ λ R = 0 . (53)Turning to the H -domain condition, we exploit the linearrelationship between vectors of R and H as well as the relation(32) between R and H gradients ∇ H ∗ f = J n ∇ R f . Aftersimplification, one obtains the stationarity condition in H : ∇ H ∗ L (˜ q H , ˜ ν , ˜ λ H ) = 0 ⇔ ∇ H ∗ f (˜ q H ) + m X i =1 ˜ ν i ∇ H ∗ f i (˜ q H ) + 14 A H H ˜ λ H = 0 . (54)To obtain the desired stationarity condition (51) in H n , wekeep the n -first rows of (54) and compute explicitly the n -first blocks of the quaternion matrix product A H H ˜ λ H . V. Q UATERNION A LTERNATING D IRECTION M ETHOD OF M ULTIPLIERS
The quaternion-domain optimization framework introducedin previous sections permits an efficient and natural derivationof quaternion-domain algorithms from their existing real-domain counterparts. The methodology is as follows: given aquaternion-domain optimization problem in variable q ∈ H n ,we find its real augmented domain equivalent in variable q R n ∈ R n . Then, one can pick any real-domain algorithm tosolve the real-augmented problem; once the iterates for q R n are found, they are converted into quaternion augmented do-main H n . Finally the quaternion-domain algorithm is obtainedby considering only the first n entries of q H n . This strategy isvery general and can be virtually applied to any real-domainalgorithm. Importantly, it also ensures that the convergenceproperties of the quaternion-domain algorithms are directlyinherited from their augmented real counterparts.To illustrate the proposed methodology, we derive in thesequel the quaternion version of the popular alternating direc-tion method of multipliers (ADMM), which we simply callquaternion ADMM (Q-ADMM). This focus is motivated bythe fact that its real-domain counterpart [36] can accomodatea large variety of constraints together while maintaining sim-ple and efficient updates. As such, Q-ADMM appears as aversatile algorithm for quaternion-domain optimization. Notethat there have been several attempts to formulate ADMMfor quaternion-domain optimization problems: they either relyon a real augmented formulation [37], [38] or are particularlydesigned for specific applications [27], [28]. In constrast, thispaper introduces a general Q-ADMM framework by leveragingthe proposed quaternion convex optimization framework.Now, consider the general quaternion optimization problem:minimize f ( q ) + g ( p ) subject to A q + A q i + A q j + A q k + B p + B p i + B p j + B p k = c , (55)where f and g are real-valued convex functions of quaternionvariables q ∈ H n and p ∈ H m , respectively. The twovariables are linked through a widely affine constraint, definedby A i ∈ H p × n , B i ∈ H p × m for i = 1 , , , and c ∈ H p .Note that since f and g are supposed convex, problem (55)is a widely affine equality constrained convex quaternionoptimization problem. Once again, widely affine relations in q and p variables are necessary to ensure that (55) encodesthe generality of convex quaternion equality constraints.Quaternion ADMM aims at solving (55) in its quater-nion variables q and p . To develop this algorithm describedin Section V-C below, we start by deriving the standardADMM updates for the real-augmented problem associatedwith (55). Then, by considering equivalencies, one obtainsthe corresponding algorithms in the quaternion augmentedrepresentation H and eventually for H n . A. ADMM in real augmented domain
The original real-domain ADMM algorithm [36] can bedirectly applied to the real-augmented optimization problem equivalent to (55), which readsminimize f ( q R ) + g ( p R ) subject to A R q R + B R p R = c R (56)where q R ∈ R n and p R ∈ R m are augmented real variables,and where A R ∈ R p × n , B R ∈ R p × m and c R ∈ R p encode a general affine relation between augmented realvariables q R and p R . First, define the augmented Lagrangianfor ρ ≥ and Lagrange multiplier λ R ∈ R p : L ρ ( q R , p R , λ R ) , f ( q R ) + g ( p R ) + λ ⊤R ( A R q R + B R p R − c R )+ ρ k A R q R + B R p R − c R k . (57)ADMM updates then consist of the iterations q ( k +1) R = arg min q R L ρ ( q R , p ( k ) R , λ ( k ) R ) (58) p ( k +1) R = arg min p R L ρ ( q ( k +1) R , p R , λ ( k ) R ) (59) λ ( k +1) R = λ ( k ) R + ρ (cid:16) A R q ( k +1) R + B R p ( k +1) R − c R (cid:17) (60)Defining the scaled dual variable as u R = (1 /ρ ) λ R oneobtains the scaled form for ADMM q ( k +1) R = arg min q R n f ( q R )+ ρ (cid:13)(cid:13)(cid:13) A R q R + B R p ( k ) R − c R + u ( k ) R (cid:13)(cid:13)(cid:13) (cid:27) (61) p ( k +1) R = arg min p R n g ( p R )+ ρ (cid:13)(cid:13)(cid:13) A R q ( k +1) R + B R p R − c R + u ( k ) R (cid:13)(cid:13)(cid:13) (cid:27) (62) u ( k +1) R = u ( k ) R + A R q ( k +1) R + B R p ( k +1) R − c R . (63)Equipped with these expressions, the goal is now to findaugmented quaternion domain and quaternion domain coun-terparts for the expressions above. B. ADMM in quaternion augmented domain
The linear relationship between R and H permits a simplederivation of ADMM in quaternion augmented variables. First,note that by Proposition 1 one has k A R q R + B R p R − c R k = k A H q H + B H p H − c H k (64)so that together with the expression of the Lagrangian in aug-mented quaternion variables (40) the augmented Lagrangianin augmented quaternion variables reads: L ρ ( q H , p H , λ H ) = f ( q H ) + g ( p H )+ 14 λ H H ( A H q H + B H p H − c H )+ ρ k A H q H + B H p H − c H k . (65)Then, since optimal points are the same regardless the repre-sentation (see Proposition 2) the first two iterates of ADMMare identical. The dual ascent step is obtained once again by exploiting the linear relationship between R and H . Theaugmented Q-ADMM iterates thus read q ( k +1) H = arg min q H L ρ ( q H , p ( k ) H , λ ( k ) H ) (66) p ( k +1) H = arg min p H L ρ ( q ( k +1) H , p H , λ ( k ) H ) (67) λ ( k +1) H = λ ( k ) H + ρ (cid:16) A H q ( k +1) H + B H p ( k +1) H − c H (cid:17) . (68)Similar arguments and computations give the scaled formiterates, which are omitted for space considerations. C. ADMM in quaternion domain
ADMM iterates in quaternion domain can now be obtaineddirectly from expressions above. For the sake of notationbrevity, let us introduce the quaternion residual r ( q , p ) as r ( p , q ) = A q + A q i + A q j + A q k + B p + B p i + B p j + B p k − c (69)and recall that by Proposition 1 one has k r ( p , q ) k = k r H ( p H , q H ) k . The augmented Lagrangian in quaternionvariables then reads L ρ ( q , p , λ ) = f ( q ) + g ( p )+ Re (cid:16) λ H r ( p , q ) (cid:17) + ρ k r ( p , q ) k (70)The ADMM iterates in the quaternion domain are very similarto their corresponding real augmented counterparts q ( k +1) = arg min q L ρ ( q , p ( k ) , λ ( k ) ) (71) p ( k +1) = arg min p L ρ ( q ( k +1) , p , λ ( k ) ) (72) λ ( k +1) = λ ( k ) + ρ r ( q ( k +1) , p ( k +1) ) (73)and can be expressed in scaled form as q ( k +1) = arg min q (cid:26) f ( q ) + ρ (cid:13)(cid:13)(cid:13) r ( q , p ( k ) ) + u ( k ) (cid:13)(cid:13)(cid:13) (cid:27) (74) p ( k +1) = arg min p (cid:26) g ( p ) + ρ (cid:13)(cid:13)(cid:13) r ( q ( k +1) , p ) + u ( k ) (cid:13)(cid:13)(cid:13) (cid:27) (75) u ( k +1) = u ( k ) + r ( q ( k +1) , p ( k +1) ) . (76) D. Convergence of Q-ADMM
Q-ADMM inherits its convergence properties from theconvergence results of the associated augmented real ADMMalgorithm [36], which are adapted to the quaternion casebelow for completeness. More precisely, let us make thestandard assumptions that the extended-real-valued functions f : H n → R and g : H m → R are closed, proper and convex.We also assume that there exist at least one (˜ q , ˜ p , ˜ λ ) suchthat L (˜ q , ˜ p , λ ) ≤ L (˜ q , ˜ p , ˜ λ ) ≤ L ( q , p , ˜ λ ) for all q , p , λ .Under these two assumptions, the Q-ADMM iterates satisfy: • Residual convergence: r ( q ( k ) , p ( k ) ) → as k → ∞ ; • Objective convergence: f ( q ( k ) ) + g ( p ( k ) ) → ˜ v , where ˜ v is the optimal value of (55); • Dual variable convergence: λ ( k ) → ˜ λ as k → ∞ . Necessary and sufficient conditions for primal-dual optimalityof the triplet (˜ q , ˜ p , ˜ λ ) can be directly obtained for the Q-ADMM problem (55) using the KKT conditions (46)–(50);they are omitted here for space considerations. E. Special case: proximal operator form
Consider the special case where the affine constraint in (55)is simply given by q − p = 0 , that is, m = n , A = B = I n and A i = B i = n for i = 2 , , . Q-ADMM iterations thenbecome q ( k +1) = arg min q (cid:26) f ( q ) + ρ (cid:13)(cid:13)(cid:13) q − p ( k ) + u ( k ) (cid:13)(cid:13)(cid:13) (cid:27) (77) p ( k +1) = arg min p (cid:26) g ( p ) + ρ (cid:13)(cid:13)(cid:13) q ( k +1) − p + u ( k ) (cid:13)(cid:13)(cid:13) (cid:27) (78) u ( k +1) = u ( k ) + q ( k +1) − p ( k +1) . (79)Focusing on the q -update for simplicity, (77) can be rewrittenas q ( k +1) , prox f/ρ ( p ( k ) − u ( k ) ) (80)where prox f/ρ denotes the quaternion proximal operator of f with penalty ρ , first introduced in [39]. Importantly, when f is the indicator function on a closed convex set C ⊂ H n , the q -update becomes q ( k +1) = arg min q ∈C (cid:13)(cid:13)(cid:13) q − p ( k ) + u ( k ) (cid:13)(cid:13)(cid:13) , Π C (cid:16) p ( k ) − u ( k ) (cid:17) (81)where Π C denotes the projection onto C in the quaternionEuclidean norm.VI. T HE FRAMEWORK IN PRACTICE
This last section illustrates the relevance of the proposedframework by considering two general examples of con-strained convex optimization problems in quaternion variables.Both problems can be solved efficiently using the Q-ADMMalgorithm introduced in Section V. Since Q-ADMM sharethe same convergence and numerical properties with its real-augmented domain counterpart, we only focus hereafter on themany insights enabled by the quaternion framework.
A. Constrained widely linear least squares
We consider the following convex optimization problemminimize k y − P q − P q i − P q j − P q k k subject to q ∈ C , (82)where C is a closed convex subset of H n . An important specialcase is when C is a convex cone, in particular to encode quaternion non-negative constraints. For instance, in [40] theauthors enforce each component of the vector to have non-negative real and imaginary parts; in [41], each componentof the vector must obey to q a ≥ and q a ≥ q b + q c + q d ,which can be interpreted as the non-negative definiteness of aspecific 2-by-2 complex matrix. The general constrained widely linear least square problem(82) can be solved efficiently using Q-ADMM. Following astandard procedure [36], the problem (82) can be cast asminimize k y − P q − P q i − P q j − P q k k + ι C ( p ) subject to q − p = 0 (83)where ι C denotes the indicator function of C such that ι C ( p ) =0 for p ∈ C and ι ( p ) = + ∞ otherwise. Q-ADMM iterationscan be directly applied: q ( k +1) = arg min q (cid:8) k y − P q − P q i − P q j − P q k k + ρ k q − p ( k ) + u ( k ) k o (84) p ( k +1) = Π C (cid:16) q ( k +1) + u ( k ) (cid:17) (85) u ( k +1) = u ( k ) + q ( k +1) − p ( k +1) . (86)In a nutshell, Q-ADMM consists in iteratively solving a gen-eral unconstrained quaternion least squares problem followedby projection onto the constraint set C and dual ascent. Suchformulation is thus particulary useful when projection onto C can be carried out explicitly, as in the case of non-negativeconstraints mentioned above [40], [41]. Solving the q -variable subproblem (84) Since the functionto be minimized involves q and its three canonical involutions q i , q j , q k , finding a solution is not as straightforward as inthe linear case ( P = P = P = ). Two strategies areessentially possible: (i) obtain an explicit expression for q ( k +1) using the augmented real R or the augmented quaternionrepresentation H ; (ii) solve iteratively for (84) e.g. usingquaternion gradient descent in H n . We describe below thesetwo approaches in detail. a) Explicit solution via augmented representations: The q -update (84) can be rewritten in R as the q R -update q ( k +1) R = arg min q R n k P R q R − y R k + ρ k q R − v R k o (87)where P R ∈ R m × n , y R ∈ R m and v R = p ( k ) R − u ( k ) R ∈ H n is constant. Being a standard real optimization problem,the optimality condition reads: P ⊤R ( P R q R − y R ) + ρ ( q R − v R ) = 0 (88)so that one gets easily the standard explicit solution q ( k +1) R = (cid:0) P ⊤R P R + ρ I n (cid:1) − (cid:0) P ⊤R y R + ρ v R (cid:1) . (89)Exploiting the relation between R and H augmented repre-sentations, we also get q ( k +1) H = (cid:0) P H H P H + ρ I n (cid:1) − (cid:0) P H H y H + ρ v H (cid:1) (90)so that the explicit q -update reads q ( k +1) = S (cid:0) P H H P H + ρ I n (cid:1) − (cid:0) P H H y H + ρ v H (cid:1) (91)where S = (cid:2) I n n n n (cid:3) ∈ R n × n selects the first n entries of q ( k +1) H . Notably, when the cost function is linearquadratic, i.e. when P i = 0 for i = 2 , , , Eq. (91) becomes q ( k +1) = (cid:0) P H P + ρ I n (cid:1) − (cid:0) P H y + ρ v (cid:1) . (92) Unfortunately for the general (widely linear) case, such ex-pression of q ( k +1) in terms of quaternion-domain matrices P i , i = 1 , , , is not possible. One has to turn back to (91),which requires inversion of augmented matrices of size n ,thus limiting its application for large scale applications when n is large. b) Iterative scheme via quaternion gradient descent: An-other possibility is to use an iterative scheme to approximatelysolve (84). We choose here quaternion gradient descent [19],[42], which takes the form q ( ℓ +1) = q ( ℓ ) − η ℓ ∇ q ∗ h ( q ( ℓ ) ) , (93)where η ℓ is a iteration dependent step-size and where h ( q ) isdefined by h ( q ) = (cid:13)(cid:13) P q + P q i + P q j + P q k − y (cid:13)(cid:13) + ρ (cid:13)(cid:13)(cid:13) q − p ( k ) + u ( k ) (cid:13)(cid:13)(cid:13) . (94)These iterations provide an approximate solution to the q -optimization problem (84) such that q ( k +1) = q ( ℓ ) , where ℓ is defined by an appropriate stopping criterion controlling theaccuracy of the solution. Explicit iterations can be obtainedby using results from Appendix B: q ( ℓ +1) = q ( ℓ ) − η ℓ (cid:26) P H r ( ℓ ) P + (cid:16) P H r ( ℓ ) P (cid:17) i + (cid:16) P H r ( ℓ ) P (cid:17) j + (cid:16) P H r ( ℓ ) P (cid:17) k (cid:27) − ρη ℓ h q ( ℓ ) − p ( k ) + u ( k ) i (95)where r ( ℓ ) P := r P ( q ( ℓ ) ) such that r P ( q ) = P q + P q i + P q j + P q k − y . Compared with the explicit solution (91) ofthe subproblem (84), the approximate iterative solution doesnot require quaternion matrix inversion. This is particularlyinteresting with ADMM, since overall convergence of thealgorithm can still be guaranteed even when minimization ofsubproblems is carried out approximately [36]. As a conse-quence, performing a few (cheap) quaternion gradient steps(93) at each iteration k will ensure convergence of Q-ADMMto a stationary point of the cost function. B. 3D basis pursuit denoising
A major application of quaternion algebra lies in its abilityto represent 3D data, including color images [1], [43], winddata [44], seismics [7], among others. A dataset comprising N
3D vectors is coded as a pure quaternion vector of dimension N : for instance, a color image patch is described as thepure quaternion vector q = i r + j g + k b , where r , g , b arereal vectors denoting the red, green and blue components ofthe image. With this representation, we consider the generalpure quaternion (or 3D) basis pursuit denoising problem, firstformulated in the color image processing literature [3], [37],[45]. 3D data measurements y are supposed to follow thelinear quaternion model y = Dq + n , where D ∈ H m × n is thedictionary ( m < n ), q ∈ H n is the vector of sparse coefficientsand n is the noise. While the relevance of the quaternion modelhas been established by many authors, its interpretability isconditioned by the reconstructed 3D signal Dq being a pure quaternion vector, that is Re ( Dq ) = 0 . Unfortunately, thisis not the case in general, when D and q are quaternion-valued and must be enforced within the algorithm in orderto obtain interpretable solutions. Currently [45], the constraint Re ( Dq ) = 0 is generally imposed by simply nulling the realpart of the product Dq – which does not preserve convergenceproperties. The proposed algorithm hereafter solves this issueof existing algorithms by leveraging the Q-ADMM framework.The resulting 3D basis pursuit denoising can be formulatedas follows minimize k y − Dq k + β k q k subject to Re ( Dq ) = 0 , (96)where β > is a parameter that controls the amount ofsparsity. The quaternion ℓ -norm promotes sparsity and isdefined by k q k , n X i =1 | q i | = n X i =1 q q ai + q bi + q ci + q di . (97)As noted in [3], the quaternion ℓ -norm is equivalent to the real ℓ , -norm in R n × , meaning that quaternion Lasso can be seenas an instance of real-valued group Lasso where groups arecomposed of the real and three imaginary parts of a quaternion.The constraint Re ( Dq ) = 0 is widely affine since Re ( Dq ) = 0 ⇔ Dq + ( Dq ) i + ( Dq ) j + ( Dq ) k = 0 (98)This ensures that (96) defines a quaternion convex optimiza-tion problem, which can be rewritten in Q-ADMM form asminimize k y − Dq k + β k p k subject to q − p = 0 and Re ( Dq ) = 0 . (99)Q-ADMM iterations then read q ( k +1) = arg min q Re ( Dq )=0 k y − Dq k + ρ k q − p ( k ) + u ( k ) k (100) p ( k +1) = prox βρ k·k (cid:16) q ( k +1) + u ( k ) (cid:17) (101) u ( k +1) = u ( k ) + q ( k +1) − p ( k +1) . (102)The q -update is a widely affine constrained least squaresproblem, which can be tackled by solving the KKT conditions(46)–(50), see details below. The p -update involves the com-putation of the proximal operator of the quaternion ℓ -norm,whose expression can be found in the existing literature [39]: prox λ k·k ( q ) = (cid:20) max (cid:18) , − λ | q i | (cid:19) q i (cid:21) i =1 ...n , S λ ( q ) , (103)where S λ ( · ) is the quaternion soft-thresholding operator. As aresult, the p -update becomes p ( k +1) = S βρ (cid:16) q ( k +1) + u ( k ) (cid:17) . (104) Solving the q -variable subproblem (100) KKT conditions(46)–(50) for the widely affine constrained least squares prob-lem (100) give necessary and sufficient conditions for (˜ q , ˜ λ ) to be primal and dual optimal: D ˜ q + ( D ˜ q ) i + ( D ˜ q ) j + ( D ˜ q ) k = 0 (105) D H ( D ˜ q − y ) + ρ q − v )+ 14 h D H ˜ λ + D H ˜ λ i + D H ˜ λ j + D H ˜ λ k i = 0 (106)where v = p ( k ) − u ( k ) . Solving for ˜ q gives ˜ q = (cid:0) D H D + I n (cid:1) − (cid:16) ρ v + D H y − D H Re ˜ λ (cid:17) . (107)Let ˜ q un = (cid:0) D H D + I n (cid:1) − (cid:0) ρ v + D H y (cid:1) denote the uncon-strained solution to the least square problem (100). Plugging(107) into the constraint Re ( D ˜ q ) = 0 gives the value of thereal part of the Lagrange multiplier ˜ λ : Re ˜ λ = 14 (cid:2) Re (cid:0) D ( D H D + I n ) − D H (cid:1)(cid:3) − Re ( D ˜ q un ) . (108)To summarize, the subproblem (100) is solved using KKTconditions as follows: • compute the unconstrained least square solution ˜ q un ; • compute Re ˜ λ using (108); • obtain the explicit solution ˜ q to (100) using (107).It is easily verified that if the unconstrained least squaresolutions satisfies the constraint, then Re ˜ λ = 0 and thesolution to (100) is ˜ q = ˜ q un .VII. C ONCLUSION
In this paper, we introduced a mathematical frameworkfor solving constrained convex quaternion optimization prob-lems directly in the quaternion domain. By leveraging thegeneralized HR -calculus and the equivalence between quater-nion variables and their augmented real representations, theproposed framework enables the formulation of fundamentalquaternion-domain convex optimization tools, such as thequaternion Lagrangian function or KKT optimality conditions.For practical purposes, we derived quaternion ADMM (Q-ADMM) as a versatile framework for quaternion-domainoptimization. These results establish a systematic and gen-eral methodology to develop quaternion-domain algorithmsfor convex and nonconvex quaternion optimization problems.Together with the recent interest in high-performance hard-ware implementation of quaternion operations [46], [47], theproposed framework pave the way to the generalization ofquaternion-domain optimization procedures in many signalprocessing applications. Note that the imaginary part of ˜ λ can be arbitrary, since the constraint Re ( D ˜ q ) = 0 is here between real-valued expressions. A PPENDIX AE XPLICIT COMPUTATION OF A H We adopt the notations from Section II-C. Let us computethe matrix A H = J p A R J H n explicitly, by describing A R interms of p × n real-valued blocks: A R , A R A R A R A R A R A R A R A R A R A R A R A R A R A R A R A R (109)One gets J p A R = ˜ A · ˜ A · ˜ A · ˜ A · ˜ A i · ˜ A i · ˜ A i · ˜ A i · ˜ A j · ˜ A j · ˜ A j · ˜ A j · ˜ A k · ˜ A k · ˜ A k · ˜ A k · (110)where ˜ A · j = A R j + i A R j + j A R j + k A R j ∈ H p × n for j = 1 , , , . It follows that A H , J p A R J H n = 14 A A A A A i A i A i A i A j A j A j A j A k A k A k A k (111)where for convenience we have introduced A = ˜ A · − ˜ A · i − ˜ A · j − ˜ A · k (112) A = ˜ A · − ˜ A · i + ˜ A · j + ˜ A · k (113) A = ˜ A · + ˜ A · i − ˜ A · j + ˜ A · k (114) A = ˜ A · + ˜ A · i + ˜ A · j − ˜ A · k . (115)A careful inspection of the expressions above shows thatthere exist a one-to-one mapping between quaternion matrices A , A , A , A and the real matrix blocks A R ij that define A R . A PPENDIX BQ UATERNION GRADIENT COMPUTATIONS
Consider the function f : H n → R defined by f ( q ) = (cid:13)(cid:13) A q + A q i + A q j + A q k − b (cid:13)(cid:13) where A i ∈ H p × n and b ∈ H p are arbitrary. Using manipulations similar to thoseof Section II-C, we get f ( q ) = f ( q R ) = k A R q R − b R k with real augmented gradient ∇ R f ( q R ) = A ⊤R ( A R q R − b R ) Knowing that A H = J p A R J H n and q H = J n q R , we get theaugmented conjugated quaternion gradient from (31) as ∇ H ∗ f ( q H ) = 14 J n ∇ R f ( q ) = 14 A H H ( A H q H − b H ) . Let the residual r H = A H q H − b H . By developing the first n -rows of the quaternion matrix product A H r H we get theconjugated quaternion gradient ∇ q ∗ f ( q ) = 14 (cid:16) A H r + (cid:0) A H r (cid:1) i + (cid:0) A H r (cid:1) j + (cid:0) A H r (cid:1) k (cid:17) . R EFERENCES[1] B. Chen, H. Shu, G. Coatrieux, G. Chen, X. Sun, and J. L. Coatrieux,“Color image analysis by quaternion-type moments,”
Journal ofmathematical imaging and vision , vol. 51, no. 1, pp. 124–144, 2015.[2] ¨O. N. Subakan and B. C. Vemuri, “A quaternion framework forcolor image smoothing and segmentation,”
International Journal ofComputer Vision , vol. 91, no. 3, pp. 233–250, 2011.[3] Y. Xu, L. Yu, H. Xu, H. Zhang, and T. Nguyen, “Vector SparseRepresentation of Color Image Using Quaternion Matrix Analysis,”en,
IEEE Transactions on Image Processing , vol. 24, no. 4, pp. 1315–1329, Apr. 2015.[4] J. C. K. Chou, “Quaternion kinematic and dynamic differentialequations,”
IEEE Transactions on Robotics and Automation , vol. 8,no. 1, pp. 53–64, 1992.[5] F. L. Markley and D. Mortari, “Quaternion attitude estimation usingvector observations,”
The Journal of the Astronautical Sciences ,vol. 48, no. 2, pp. 359–380, 2000.[6] R. Kristiansen and P. J. Nicklasson, “Satellite attitude control byquaternion-based backstepping,” in
Proceedings of the 2005, Ameri-can Control Conference, 2005. , IEEE, 2005, pp. 907–912.[7] S. Miron, N. Le Bihan, and J. I. Mars, “Quaternion-music for vector-sensor array processing,”
IEEE transactions on signal processing ,vol. 54, no. 4, pp. 1218–1229, 2006.[8] J. Flamant, N. Le Bihan, and P. Chainais, “Time–frequency analysis ofbivariate signals,” en,
Applied and Computational Harmonic Analysis ,vol. 46, no. 2, pp. 351–383, Mar. 2019.[9] J. Flamant, P. Chainais, and N. Le Bihan, “A Complete Frameworkfor Linear Filtering of Bivariate Signals,” en,
IEEE Transactions onSignal Processing , vol. 66, no. 17, pp. 4541–4552, Sep. 2018.[10] C. Yi, Y. Lv, Z. Dang, H. Xiao, and X. Yu, “Quaternion singularspectrum analysis using convex optimization and its applicationto fault diagnosis of rolling bearing,” en,
Measurement , vol. 103,pp. 321–332, Jun. 2017.[11] K. Shoemake, “Animating rotation with quaternion curves,” in
Pro-ceedings of the 12th annual conference on Computer graphics andinteractive techniques , 1985, pp. 245–254.[12] X. Zhu, Y. Xu, H. Xu, and C. Chen, “Quaternion convolutional neuralnetworks,” in
Proceedings of the European Conference on ComputerVision (ECCV) , 2018, pp. 631–647.[13] T. Parcollet, M. Morchid, and G. Linar`es, “Quaternion convolutionalneural networks for heterogeneous image processing,” in
ICASSP2019-2019 IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP) , IEEE, 2019, pp. 8514–8518.[14] A. Sudbery, “Quaternionic analysis,” in
Mathematical Proceedings ofthe Cambridge Philosophical Society , Cambridge University Press,vol. 85, 1979, pp. 199–225.[15] S. Lang,
Complex analysis . Springer Science & Business Media,2013, vol. 103.[16] D. P. Mandic, C. Jahanchahi, and C. C. Took, “A quaternion gradientoperator and its applications,”
IEEE Signal Processing Letters , vol. 18,no. 1, pp. 47–50, 2011.[17] D. Xu and D. P. Mandic, “The theory of quaternion matrix deriva-tives.,”
IEEE Trans. Signal Processing , vol. 63, no. 6, pp. 1543–1556,2015.[18] D. Xu, C. Jahanchahi, C. C. Took, and D. P. Mandic, “Enablingquaternion derivatives: The generalized HR calculus,” en,
RoyalSociety Open Science , vol. 2, no. 8, p. 150 255, Aug. 2015.[19] D. Xu, Y. Xia, and D. P. Mandic, “Optimization in quaterniondynamic systems: Gradient, Hessian, and learning algorithms,”
IEEETransactions on Neural Networks and Learning Systems , vol. 27,no. 2, pp. 249–261, 2016.[20] K. Kreutz-Delgado, “The Complex Gradient Operator and the CR-Calculus,” arXiv:0906.4835 [math] , Jun. 2009, arXiv: 0906.4835.[21] L. Sorber, M. V. Barel, and L. D. Lathauwer, “Unconstrained Opti-mization of Real Functions in Complex Variables,” en,
SIAM Journalon Optimization , vol. 22, no. 3, pp. 879–898, Jan. 2012.[22] L. Li, X. Wang, and G. Wang, “Alternating Direction Method ofMultipliers for Separable Convex Optimization of Real Functionsin Complex Variables,” en,
Mathematical Problems in Engineering ,vol. 2015, pp. 1–14, 2015.[23] C. Jahanchahi, C. Cheong Took, and D. P. Mandic, “A class ofquaternion valued affine projection algorithms,” en,
Signal Processing ,vol. 93, no. 7, pp. 1712–1723, Jul. 2013.[24] B. Che Ujang, C. C. Took, and D. P. Mandic, “Quaternion-ValuedNonlinear Adaptive Filtering,” en,
IEEE Transactions on NeuralNetworks , vol. 22, no. 8, pp. 1193–1206, Aug. 2011. [25] C. Jahanchahi and D. P. Mandic, “A Class of Quaternion KalmanFilters,” en,
IEEE Transactions On Neural Networks And LearningSystems , vol. 25, no. 3, p. 12, 2014.[26] Y. Chen, X. Xiao, and Y. Zhou, “Low-Rank Quaternion Approxima-tion for Color Image Processing,” en,
IEEE Transactions on ImageProcessing , vol. 29, pp. 1426–1439, 2020.[27] J. Miao and K. I. Kou, “Quaternion-Based Bilinear Factor MatrixNorm Minimization for Color Image Inpainting,” en,
IEEE Transac-tions on Signal Processing , vol. 68, pp. 5617–5631, 2020.[28] J. Miao, K. I. Kou, and W. Liu, “Low-rank quaternion tensor comple-tion for recovering color videos and images,” en,
Pattern Recognition ,vol. 107, p. 107 505, Nov. 2020.[29] Y. Liu, Y. Zheng, J. Lu, J. Cao, and L. Rutkowski, “ConstrainedQuaternion-Variable Convex Optimization: A Quaternion-Valued Re-current Neural Network Approach,” en,
IEEE Transactions on NeuralNetworks and Learning Systems , vol. 31, no. 3, pp. 1022–1035, Mar.2020.[30] Z. Xia, Y. Liu, J. Lu, and L. Rutkowski, “Penalty Method forConstrained Distributed Quaternion-Variable Optimization,” en,
IEEETransactions on Cybernetics , pp. 1–6, 2020, in press.[31] J. P. Ward,
Quaternions and Cayley numbers: Algebra and applica-tions . Springer Science & Business Media, 2012, vol. 403.[32] F. Zhang, “Quaternions and matrices of quaternions,” en,
LinearAlgebra and its Applications , vol. 251, pp. 21–57, Jan. 1997.[33] R. Watson, “The generalized Cauchy-Riemann-Fueter equation andhandedness,”
Complex Variables , vol. 48, no. 7, pp. 555–568, 2003.[34] A. Sudbery, “Quaternionic analysis,” en,
Mathematical Proceedingsof the Cambridge Philosophical Society , vol. 85, no. 2, pp. 199–225,Mar. 1979.[35] S. Boyd and L. Vandenberghe,
Convex Optimization , 1st ed. Cam-bridge University Press, Mar. 2004.[36] S. Boyd, “Distributed Optimization and Statistical Learning via theAlternating Direction Method of Multipliers,” en,
Foundations andTrends® in Machine Learning , vol. 3, no. 1, pp. 1–122, 2010.[37] C. Zou, K. I. Kou, and Y. Wang, “Quaternion Collaborative andSparse Representation With Application to Color Face Recognition,”en,
IEEE Transactions on Image Processing , vol. 25, no. 7, pp. 3287–3302, Jul. 2016.[38] C. Zou, K. I. Kou, Y. Wang, and Y. Y. Tang, “Quaternion blocksparse representation for signal recovery and classification,” en,
SignalProcessing , vol. 179, p. 107 849, Feb. 2021.[39] T.-S. T. Chan and Y.-H. Yang, “Complex and Quaternionic PrincipalComponent Pursuit and Its Application to Audio Separation,” en,
IEEE Signal Processing Letters , vol. 23, no. 2, pp. 287–291, Feb.2016, arXiv: 1801.03816.[40] T. Mizoguchi and I. Yamada, “Hypercomplex Low Rank MatrixCompletion with Non-negative Constraints via Convex Optimization,”en, in
ICASSP 2019 - 2019 IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP) , Brighton, UnitedKingdom: IEEE, May 2019, pp. 8538–8542.[41] J. Flamant, S. Miron, and D. Brie, “Quaternion Non-Negative MatrixFactorization: Definition, Uniqueness, and Algorithm,” en,
IEEETransactions on Signal Processing , vol. 68, pp. 1870–1883, 2020.[42] D. P. Mandic, C. Jahanchahi, and C. C. Took, “A Quaternion GradientOperator and Its Applications,” en,
IEEE Signal Processing Letters ,vol. 18, no. 1, pp. 47–50, Jan. 2011.[43] N. Le Bihan and S. J. Sangwine, “Quaternion principal componentanalysis of color images,” in
Proceedings 2003 International Con-ference on Image Processing (Cat. No. 03CH37429) , IEEE, vol. 1,2003, pp. I–809.[44] C. C. Took, G. Strbac, K. Aihara, and D. Mandic, “Quaternion-valued short-term joint forecasting of three-dimensional wind andatmospheric parameters,”
Renewable Energy , vol. 36, no. 6, pp. 1754–1760, 2011.[45] Q. Barthelemy, A. Larue, and J. I. Mars, “Color Sparse Representa-tions for Image Processing: Review, Models, and Prospects,” en,
IEEETransactions on Image Processing , vol. 24, no. 11, pp. 3978–3989,Nov. 2015.[46] D. Williams-Young and X. Li, “On the Efficacy and High-Performance Implementation of Quaternion Matrix Multiplication,”en, arXiv:1903.05575 [cs] , Mar. 2019, arXiv: 1903.05575.[47] M. Joldes¸ and J.-M. Muller, “Algorithms for manipulating quaternionsin floating-point arithmetic,” in2020 IEEE 27th Symposium onComputer Arithmetic (ARITH)