[PDF] Sparsity-based Algorithm for Detecting Faults in Rotating Machines

Abstract

This paper addresses the detection of periodic transients in vibration signals for detecting faults in rotating machines. For this purpose, we present a method to estimate periodic-group-sparse signals in noise. The method is based on the formulation of a convex optimization problem. A fast iterative algorithm is given for its solution. A simulated signal is formulated to verify the performance of the proposed approach for periodic feature extraction. The detection performance of comparative methods is compared with that of the proposed approach via RMSE values and receiver operating characteristic (ROC) curves. Finally, the proposed approach is applied to compound faults diagnosis of motor bearings. The non-stationary vibration data were acquired from a SpectraQuest's machinery fault simulator. The processed results show the proposed approach can effectively detect and extract the useful features of bearing outer race and inner race defect.

Full PDF

SSparsity-based Algorithm for Detecting Faults in Rotating Machines ∗ Wangpeng He † , Yin Ding ‡ , Yanyang Zi , and Ivan W. Selesnick Tandon School of Engineering, New York University, 6 Metrotech Center, Brooklyn, NY 11201, USA State Key Laboratory for Manufacturing and Systems Engineering, School of Mechanical Engineering, Xi’anJiaotong University, Xi’an 710049, P. R. China

August 23, 2018

Abstract

This paper addresses the detection of periodic transients in vibration signals for detecting faults inrotating machines. For this purpose, we present a method to estimate periodic-group-sparse signalsin noise. The method is based on the formulation of a convex optimization problem. A fast iterativealgorithm is given for its solution. A simulated signal is formulated to verify the performance of theproposed approach for periodic feature extraction. The detection performance of comparative methodsis compared with that of the proposed approach via RMSE values and receiver operating characteristic(ROC) curves. Finally, the proposed approach is applied to compound faults diagnosis of motor bearings.The non-stationary vibration data were acquired from a SpectraQuest’s machinery fault simulator. Theprocessed results show the proposed approach can eﬀectively detect and extract the useful features ofbearing outer race and inner race defect.

Rotating machinery is one of the most common types of mechanical equipment and plays a signiﬁcant rolein industrial applications. Early detection of faults developing in rotating machinery is of great importanceto prevent economic loss and personal casualties [1]. Rolling element bearings and gearboxes are two kindsof widely used components in rotating machines and their failures are among the most frequent reasons formachine breakdown.Much attention has focused on vibration-based diagnosis of mechanical faults in rotating machines [2].The detection of periodically occurring transient vibration signatures is of vital importance for vibration-based condition monitoring and fault detection of rotating machinery [3]. However, these useful transientfeatures are usually buried in heavy background noise and other irrelevant vibrations. To address thisproblem, many signal processing methods have been introduced, such as singular value decomposition (SVD)[4], empirical mode decomposition (EMD) [5], and methods based on diﬀerent wavelet transforms, e.g., dual-tree wavelet in [6], harmonic wavelet in [7], and tunable Q-factor wavelet (TQWT) in [8]. These methodshave achieved successful applications in the ﬁeld of machinery fault diagnosis.Recently, an algorithm, called ‘overlapping group shrinkage’ (OGS) was developed for estimating group-sparse signals in noise [9]. The OGS algorithm was initially formulated as a convex optimization promoting ∗ Preprint submitted to

Mechanical Systems and Signal Processing (Elsevier) . † Email: [email protected] ‡ Email: [email protected] a r X i v : . [ c s . S D ] O c t roup sparsity by a convex regularization. In order to promote sparsity more strongly, an improved versionof OGS was developed, which utilizes a non-convex regularization [10]. The superiority of denoising group-sparse signals using the approach presented in [10] indicates its potential for eﬀectively extracting periodictransient pulses.This paper aims to develop an approach for rotating machinery fault diagnosis based on a periodic group-sparse signal representation. The signature of localized faults of the gear teeth and bearing componentsgenerally exhibit periodic transient pulses when a rotating machine is operated at a constant speed [11].Meanwhile, the large amplitudes of these useful features are not only sparse but also tend to form groups.Several neighborhood-based denoising methods have been developed for machinery fault diagnosis utilizingthis property [12–14]. Our approach is based on a signal model intended to capture the useful impulsivefeatures for machinery fault diagnosis. In particular, this paper addresses the problem of estimating x froma noisy observation y . We model the measured discrete-time series, y , as y n = x n + w n , n = 0 , . . . , N − , (1)where the signal x is known to have a periodic group-sparse property and w is white Gaussian noise. Agroup-sparse signal is one where large magnitude signal values tend not to be isolated. Instead, these largemagnitude values tend to form groups.Convex optimization is commonly used to estimate sparse vectors from noisy signals, where we solve theoptimization problem with the prototype x ∗ = arg min x (cid:110) F ( x ) = 12 (cid:107) y − x (cid:107) + λ Φ( x ) (cid:111) , (2)where λ > R N → R is a sparsity-promoting penalty function(regularizer). Conventionally, the regularizer Φ( x ) is a convex function, e.g. (cid:96) -norm. In [15], an idea ofusing non-convex regularizer and keeping the convexity of entire problem is used for signal denoising problem,where the sparsity can be signiﬁcantly promoted comparing to (cid:96) -norm, and the problem still has a uniquesolution due to the convexity.In this paper, we adopt ideas from [15] and [9], and present a method for estimating periodic-group-sparsesignals in noise. We propose its use for detecting faults in rotating machinery, where the fault characteristicfrequency (period of the group-sparse pulses) is used as prior knowledge. Similar to the approach in [10],the non-convex regularization term in the proposed method is properly chosen so as to ensure that the totalobjective function F is convex; however, in contrast to [10], where each group has to be contiguous, we allowgrouping with intervals, and furthermore periodically.As a consequence, in this work, the regularization term Φ in (2) is formulated speciﬁcally to utilizethe periodicity of the impulsive fault features. The aim of our approach is to capture the useful impulsivefeatures for the purpose of machinery fault diagnosis. Additionally, it has the potential to separate compoundfault features by utilizing diﬀerent periods of the periodic transient pulses corresponding to diﬀerent faultfrequencies (e.g., various defect frequencies of rolling element bearings). The proposed approach also reducesto a non-periodic group-sparse signal denoising method, i.e., we can utilize the sparsity-based OGS approach[10], if we do not have prior knowledge of the period of the group-sparse transients. Thus, the proposedsparsity-based approach is a generalization of the non-convex regularized OGS [10].The paper is organized as follows. A brief review of OGS with convex and non-convex regularization isgiven in Section 2. Section 3 presents a method for denoising periodic group sparse signals. In Section 4 asimulation study is performed to validate the eﬀectiveness of the proposed method. Section 5 applies the2roposed periodic group sparse denoising method to fault diagnosis of motor bearings for further validationof its eﬀectiveness. Finally, conclusions are summarized in Section 6. In this section, we give short reviews of overlapping group shrinkage (OGS) [9] and majorization-minimization(MM) [16].

There are several advantages to formulating sparse estimation as a convex optimization problem. The mostbasic advantage is that the problem can then be very reliably and eﬃciently solved using convex optimizationmethods [17]. Although a non-convex regularizer can promote sparsity more strongly, it generally leads toa non-convex optimization problem with non-optimal local minima [18]. To avoid the formulation of a non-convex optimization problem, one may utilize a non-convex regularizer Φ designed so as to ensure the totalobjective function is convex.The problem of denoising a group sparse signal was addressed in [9] which utilized convex optimization.An improved method was proposed in [10], which utilized non-convex regularization designed to ensureconvexity of the objective function. The problem is solved eﬃciently by an iterative algorithm based onmajorization-minimization (MM) [16]. The objective function for the OGS problem, with a group size of K and convex or non-convex regularization, is denoted as x ∗ = arg min x ∈ R N (cid:40) P ( x ) = 12 (cid:107) y − x (cid:107) + λ (cid:88) n φ (cid:18)(cid:104) (cid:88) k ∈K x n + k (cid:105) / ; a (cid:19)(cid:41) (3)where K := { , , . . . K − } , and y ∈ R N is the noisy observation. For x ∈ R N , we deﬁne x n = 0 for n < n (cid:62) N . We assume the penalty function φ : R → R + satisﬁes the properties:1. φ is continuous on R .2. φ is twice diﬀerentiable on R \ { } .3. φ is increasing and concave on R + .4. φ is symmetric, φ ( − x ; a ) = φ ( x ; a ).5. φ (cid:48) (0 + ; a ) = 1.6. φ (cid:48)(cid:48) ( x ; a ) (cid:62) − a for all x (cid:54) = 0.7. φ ( x ; 0) = | x | .which is used to induce the resulting sparsity in an optimization problem.Note that the objective function P in (3) may be convex even if the regularizer (penalty) is not. Whenthe penalty function φ satisﬁes the conditions listed above, then the parameter ‘ a ’ can be chosen so that P is convex even if φ is not [15]. This approach is used to improve OGS in [10] where it has been proved thatthe objective function P in (3) is strictly convex if0 (cid:54) a < λK . (4)3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 φ ( x ; a ) abs log rat atan Figure 1: Examples of penalty functions.Table 1: Sparsity-promoting penalty functions ( a (cid:62) Penalty φ ( x ; a ) ψ ( x )abs | x | | x | log 1 a log(1 + a | x | ) | x | (1 + a | x | )rat | x | a | x | / | x | (1 + a | x | ) atan 2 a √ (cid:18) tan − (cid:18) a | x |√ (cid:19) − π (cid:19) | x | (1 + a | x | + a | x | )Table 1 gives some examples of penalty functions satisfying the above-listed conditions, and these exam-ples are illustrated in Fig. 1. In order to induce the sparsity most strongly, the arctangent ‘atan’ functioncan be used among all the three given non-convex functions. Fig. 1 also illustrates that under the same ‘ a ’parameter, ‘atan’ function adheres the most ‘concavity’. In addition to these non-convex penalty functions,Fig. 1 also shows the (cid:96) -norm as a special case. Note that we allow a = 0, which is the extreme case of thethree other penalties. In particular, φ ( x ; 0) = | x | is a convex function. The original OGS algorithm givenin [9] considers only the special case a = 0, i.e., convex regularization. A detailed derivation to solve problem (3), based on the majorization-minimization (MM) method is givenin [10]. The MM method simpliﬁes a complicated optimization problem into a sequence of easy ones, and isdescribed by the iteration u ( i +1) = arg min u G ( u, u ( i ) ) , (5)4here i denotes the number of iterations, and G : R N × R N → R is a majorizer of the objective function J ,satisfying G ( u, v ) (cid:62) J ( u ) , for all u, v, (6a) G ( u, u ) = J ( u ) . (6b)A majorizer of φ is given by g ( u, v ) = ( u − v )2 ψ ( v ) + φ ( v ; a ) , when v (cid:54) = 0 , (7)where ψ is the function given in Table 1. For v = 0, g ( u,

0) is deﬁned by g ( u,

0) :=  + ∞ , if u (cid:54) = 0 , , if u = 0 . (8)As a consequence, the function g : R × R → R + satisﬁes g ( u, v ) (cid:62) φ ( u ; a ) , when u (cid:54) = v, (9a) g ( u, u ) = φ ( u ; a ) , (9b)Note that g ( u,

0) deﬁned in (8) equals inﬁnity except u = 0. This forces its minimizer to lock to u = 0 in theMM iteration described in (5). This issue in the OGS problem is discussed in [10], where it does not aﬀectthe convergence when the algorithm is implemented with a non-zero initialization. To facilitate the following derivation, we deﬁne a binary sequence b = [ b , b , . . . b K − ], with b k ∈ { , } , andsets K := { , , . . . , K − } , (10a) K := { k ∈ K : b k = 0 } , (10b) K := { k ∈ K : b k = 1 } . (10c)Since b is a binary vector, we have K = K ∪ K and K ∩ K = ∅ is the empty set. We denote the cardinality(size) of the sets K , K and K as K , K , and K , respectively, so that K = K + K , and (cid:88) k ∈K b k = (cid:88) k ∈K b k = K . (11)The following proposition is straightforward. Proposition 1.

Let φ : R → R satisfy the conditions listed in Section 2. When γ > and λ > , thefunction p : R → R , p ( v ) = γ v + λφ ( v a ) (12)5 s strictly convex if φ (cid:48)(cid:48) ( v ; a ) > − γλ . (13)Detailed proof of this proposition can be found in Appendix A. We deﬁne the objective function P : R N → R as x ∗ = arg min x (cid:40) P ( x ) = 12 (cid:107) y − x (cid:107) + λ (cid:88) n φ (cid:0) θ ( x, b, n ); a (cid:1)(cid:41) (14)where the binary-weighted grouping function θ : R N × R K × Z → R is deﬁned as θ ( x, b, n ) := (cid:104) K − (cid:88) k =0 b k x n + k (cid:105) / , (15)which is the Euclidean norm of a binary weighted block. For x ∈ R N , we deﬁne B n ( x ) ∈ R K as B n ( x ) := [ x n , x n +1 , . . . x n + K − ] , (16)i.e., a K -point subvector of x , starting at index n . Hence θ ( x, b, n ) can be written as θ ( x, b, n ) = (cid:107) b (cid:12) B n ( x ) (cid:107) , (17)where (cid:12) denotes element-wise multiplication.Note that in (14) if b k = 1 for all k ∈ K , then θ ( x, b, n ) = (cid:80) k x n + k , and problem (14) reduces to(3). Therefore (3) is a special case of (14). Moreover, if K = K = 1, this problem further reduces toscalar (i.e., non-group) sparse denoising. In the following discussion, we consider the group sparse case: N (cid:29) K (cid:62) K >

0. The case K = 0 is trivial.In the following section, we exploit the convexity condition of (14), to constrain the non-convex penaltyfunction φ , so as to ensure that the objective function P is convex. Therefore, the problem formulation weultimately propose is a convex optimization problem. Hence, it will have no non-optimal local minima inwhich an iterative optimization algorithm can be trapped. Proposition 2.

Let b ∈ { , } K be a binary vector with (cid:80) k b k = K . The function P : R K → R deﬁned as P ( u ) = 12 K K − (cid:88) k =0 b k u k + λφ (cid:18)(cid:104) K − (cid:88) k =0 b k u k (cid:105) / ; a (cid:19) , (18) is strictly convex if φ (cid:48)(cid:48) ( x ; a ) > − K λ for all x (cid:54) = 0 . (19)6his Proposition is proved in detail in Appendix B. Utilizing the above results, we ﬁnd a range of a ensuring that, even though φ is non-convex, the objective function P in (14) is strictly convex. Theorem 1.

Assume that problem P (14) has K = (cid:80) k b k non-zero weights in one binary weighted group,then the objective function is strictly convex if the parameter ‘ a ’ of the penalty function φ ( · ; a ) satisﬁes (cid:54) a < K λ . (20) Proof.

First, since (cid:80) k b k = K and b k (cid:62) x n (cid:62)

0, it follows that K (cid:88) n x n = (cid:88) k b k (cid:88) n x n = (cid:88) n (cid:88) k b k x n . (21)Using (21), we write 12 (cid:88) n x n = 12 K (cid:88) n (cid:88) k b k x n . Therefore, the data ﬁdelity term in problem (14) can bewritten as F ( x ) = 12 (cid:107) y − x (cid:107) = 12 (cid:88) n x n + L ( x )= 12 K N − K (cid:88) n =0 (cid:16) (cid:88) k b k x n + k (cid:17) + L ( x ) , (22)where L ( x ) is linear in x . Adding L ( x ) to a strictly convex function yields a strictly convex function.Using the above results, the objective function in problem (14) can be reorganized as P ( x ) = N − K (cid:88) n =0 (cid:34) K (cid:88) k b k x n + k + λφ (cid:18)(cid:104) (cid:88) k b k x n + k (cid:105) / ; a (cid:19)(cid:35) + L ( x ) , = N − K (cid:88) n =0 P ( B n ( x )) + L ( x ) , (23)where B n ( x ) is deﬁned in (16). As a consequence, if P is strictly convex, then P is strictly convex. Thecondition for convexity of P is given in Proposition 2. Hence, as long as the inequality condition (19) issatisﬁed, P is strictly convex. Moreover, φ satisﬁes condition φ (cid:48)(cid:48) ( x ; a ) (cid:62) − a (condition 6 in Section 2). Thisimplies that when (20) is satisﬁed, P is strictly convex, and the entire objective function P is convex.Note that Theorem 1 generalizes the convexity condition of OGS in [10, Corollary 2]. When every elementin binary vector b equals 1 then K = K and Theorem 1 reduces to Corollary 2 in [10] as a special case.We have proved that under a more ﬂexible group structure (binary weights), non-convex penalty functionscan be utilized to promote structured sparsity, and the convexity of the objective function is preserved whenthe regularization parameter a is suitably set. Moreover, the result also shows that when maximizing thenon-convexity of the penalty function, only the the nonzero weights matter for the selection of the parameter a . 7 .3 Algorithm derivation To minimize P using the MM procedure, we deﬁne a majorizer G : R N × R N → R , deﬁned by G ( x, v ) = 12 (cid:107) y − x (cid:107) + λ (cid:88) n g (cid:0) θ ( x, b, n ) , θ ( v, b, n ) (cid:1) . (24)We can write G as G ( x, v ) = 12 (cid:107) y − x (cid:107) + λ (cid:88) n ψ ( θ ( v, b, n )) θ ( x, b, n ) + C (25a)= 12 (cid:107) y − x (cid:107) + λ (cid:88) n (cid:88) k b k ψ ( θ ( v, b, n )) x n + k + C (25b)= 12 (cid:107) y − x (cid:107) + λ (cid:88) n [ r ( v )] n x n + C (25c)where r ( v ) ∈ R N is deﬁned as [ r ( v )] n := K − (cid:88) j =0 b j ψ (cid:0) θ ( v, b, n − j ) (cid:1) . (26)Then G ( x, v ) can be written as G ( x, v ) = 12 (cid:88) n x n + λ (cid:88) n r n x n − (cid:88) n y n x n + C ( y ) (27)= (cid:88) n (cid:16)

12 + λ [ r ( v )] n (cid:17) x n − y n x n + C ( y ) (28)which has an explicit minimizer x n = y n / (1 + λ [ r ( v )] n ). Hence, the MM iteration (5) is given by x ( i +1) n = y n λ [ r ( x ( i ) n )] n . (29)Table 2 gives the explicit steps to solve (14), assuming the penalty function is chosen from Table 1 andsatisﬁes (20). This guarantees the problem (14) is strictly convex and consequently MM procedure (5) willconverge to the unique global minimizer.Note that zero-locking might occur when using quadratic function to majorize non-smooth function [19].For the OGS problem, initializing the algorithm by x (0) = y avoids this issue; a detailed proof is givenin [10, Lemma B]. This lemma is not aﬀected by introducing binary group weights. In other words, if thefunction F in [10, Lemma B] is substituted with P (14), the derivation is still valid with an almost identicalproof. Moreover, since we allow ψ ( x ) to be 0, Equation (26) may lead to a ‘divide-by-zero’. The workof [10] contains a suﬃcient discussion that this problem is avoided by the initialization x (0) = y , based on asame lemma. Hence, there is no zero-locking or ‘divide-by-zero’ issue when solving the problem (14) by thealgorithm in Table 2. In the previous sections, we have given a method for group sparse denoising with binary weights within thegroup. In signal model (1), since the periodicity of impulsive faults in x is assumed to be approximately8able 2: OGS with binary weights.Input: y ∈ R N , λ, b ∈ { , } K Initialization: x = y, S = { n : y n (cid:54) = 0 } Repeat for n ∈ S : r n = K − (cid:88) j =0 b j ψ (cid:0) θ ( x, b, n − j ) (cid:1) x n = y n λr n S = { n : | x n | > (cid:15) } Until convergenceReturn: x consistent over a reasonable duration, the time interval between two consecutive faults can be consideredidentical within the support of a group. Moreover, when the period T of a potential fault is known orpredictable from the knowledge of the machinery, we can select the group with a length K and its zero andnon-zero entries by N + N ≈ f s T, (31a) N + N = K/M, (31b)where N and N are the estimated length (in samples) of impulsive transients and the time interval betweenthem, and integer M (cid:62) f s is sampling rate. Thus,in one group, the numbers of zero and non-zero entries are K = M N and K = M N , respectively.Moreover, when the transient sequence is periodical, the binary weight b k are chosen according to aperiodic group structure. Speciﬁcally, under the constraint (31), b ∈ { , } K is given by b = [ 1 , , . . . , (cid:124) (cid:123)(cid:122) (cid:125) N , , , . . . , (cid:124) (cid:123)(cid:122) (cid:125) N , , , . . . , (cid:124) (cid:123)(cid:122) (cid:125) N , , , . . . , (cid:124) (cid:123)(cid:122) (cid:125) N , . . . , , , . . . , (cid:124) (cid:123)(cid:122) (cid:125) N , , , . . . , (cid:124) (cid:123)(cid:122) (cid:125) N ] , (32)where in each period, there are N non-zero entries grouped according to the impulsive signal, and the entiregroup comprises M periods. In this case, the last N zeros in (32) has no eﬀect in problem (14) by thedeﬁnition (15). In practice, we omit trailing zeroes and the actual length of b involved in computation is K − N . Note that although we use the parameters such as K, K , K , N , N and M to illustrate the binarypattern structure, in fact given f s and T , we only need to select N and M , then use (31), the pattern of b (32) can be determined.Consequently, we propose to recover a periodic impulsive signature from a noisy observation by solvingproblem (14) with b deﬁned by (31) and (32). We refer to this method as Periodicity-induced OGS (POGS).This is an extension of OGS accounting for the periodicity of the sparse signal.As a special case, if the period T is not known, we may use conventional OGS (3) to detect faults. Ifa period T can be determined by inspecting the output produced by OGS, then an enhanced result withbetter accuracy may be achieved by POGS using the determined periodicity.9 A m p li t ude σ = 2.5) A m p li t ude Figure 2: Example 1: Simulated signal: (a) clean data and (b) noisy data.

To test the proposed method, we apply it to the simulated data illustrated in Fig. 2(a). The signal is a1-second signal with sampling rate f s = 6400 Hz, and is composed of a periodic sequence of transientsoccurring with 80 Hz.We simulate the vibration signal containing features caused by machinery defect as a sequence of impulsivetransients, and each transient consists for 10 samples (1.6 ms when f s = 6400 Hz). In this example, eachtransient is composed of a random number of sinusoidal components, each with random frequency andrandom initial phase. More speciﬁcally, each transient can be written as v ( n ) = U (cid:88) i =1 A i sin( ω i n + β i ) , n = 0 , , . . . , (33)where 1 (cid:54) U (cid:54)

10 is a random integer, and for each i , A i is a random amplitude, and ω i is a randomfrequency, and β i is a random initial phase. The sequence of transients is shown in Fig. 2(a), and we showthe detail of one transient at about t = 0 .

40 second in the box. The generated test signal is multiplied by aconstant, so that it has unit standard deviation.In order to evaluate the false alarm rate, the ﬁrst part of the test signal contains no transient. Fig. 2(a)shows the clean test signal, where there are 50 faults starting at approximately t = 0 .

36 second with a period T = 1 /

80 second. White Gaussian noise with standard deviation σ = 2 . b . In this example, we set N = 4 and M = 4. Since T = 1 /

80 seconds, we cancalculate N = 6400 / − N = 80 − b can bedetermined as Fig. 4. The denoising result is shown in Fig. 3(a), and the transients can be easily identiﬁedwith an almost pure zero baseline. In this example, we use OGS and POGS with the atan penalty functionand non-convexity parameter a set to its maximum value of 1 / ( K λ ), so as to maximally induce sparsitysubject to the constraint that the objective function P is convex.10 N = 4, N = 76, M = 4, λ = 0. 81 A m p li t ude N = 2, N = 78, M = 4, λ = 1. 35 A m p li t ude A m p li t ude A m p li t ude Figure 3: Example 1: Denoising results. (a) Proposed method with N = 4 , M = 4. (b) Proposed methodwith N = 2 , M = 4. (c) Conventional OGS. (d) Wavelet-based denoising.Fig. 3(b) shows another example using N = 2 to determine the pattern b . The result is slightly worsein RMSE than Fig. 3(a), because N = 2 in this example does not match the simulated data. However, N = 2 is the lower limit of a realistic value, because in practice it is very rare that the transients are allsingle-sample spikes, when the data is properly measured. As a consequence, the result in Fig. 3(b) can beunderstand as the worst case of choosing an inappropriate b .Fig. 3(c) shows the result of denoising using conventional non-periodic OGS (3) with group size K = 8and the arctangent (atan) non-convex penalty function. The regularization parameter λ is chosen by thelook-up table in [9], which sets λ proportional to the noise σ . In this example, we set λ = 0 . σ . The resultin Fig. 3(c) misses some faults and yields several false detections, e.g., the ones at about 0 .

82 second, andsome false transients appear before t = 0 .

36 second.As a comparison, we adopt conventional wavelet-based denoising method to the test signal. More specif-11igure 4: Example 1: Binary pattern b when M = 4, N = 4. Time (second) A m p li t ude Threshold

Figure 5: Example 1: Results of fast spectral kurtosis. (a) Kurtogram. (b) Envelope of extracted transients.ically, a 5-scale undecimated wavelet transform [20] with 3 vanishing moments is used, and the result isshown in Fig. 3(d). For denoising, we apply hard-thresholding and chose the threshold value by 3 σ -rule foreach subband. In Fig. 3(d), although some large amplitude transients can be recovered at correct locations,they exhibit the same shape as the chosen wavelet [see the box in Fig. 3(d)].We compare the performance with fast spectral kurtosis [21] This method produces the kurtogram inFig. 5(a), where the kurtosis maximum is at the third level with an estimated ‘optimal carrier frequency’ at1000 Hz [see the bright yellow area in Fig. 5(a)]. The corresponding amplitude of the extracted transients isshown in Fig. 5(b), with an automatically generated threshold shown as a gray dashed line. The peaks after t = 0 .

36 seconds have a greater density, which indicates that it is more likely that faults occur after t = 0 . Setting pattern b . In many cases of fault diagnosis, it is feasible to estimate the period of the transientsbased by component geometry and rolling speed, but it might be diﬃcult to estimate the duration of thetransient. However, since the binary structure b is acting as a sliding window capturing the global periodicitystructure, it is not necessary to match the length of a transient exactly. As a consequence, we suggest toconstrain the value of N relatively small, e.g. 2 (cid:54) N (cid:54)

4. However, if the sampling rate is quite high anda transient may contain more samples, then the value of N can be speciﬁed greater than 4. The value of M determines the number of non-zero entries (the value of K ) in b , and K eﬀects the non-convexity of theregularizer in the proposed problem (14). In our experiments, we keep use M = 4. Setting regularization parameter λ . In order to explore the correlation of the regularization param-12 oise level ( σ )1 1.5 2 2.5 3 λ λ , M = 1 (OGS)(a) N = 1 N = 2 N = 3 N = 4 Noise level ( σ )1 1.5 2 2.5 3 λ λ , M = 2 (b) N = 1 N = 2 N = 3 N = 4 Noise level ( σ )1 1.5 2 2.5 3 λ λ , M = 3 (c) N = 1 N = 2 N = 3 N = 4 Noise level ( σ )1 1.5 2 2.5 3 λ λ , M = 4 (d) N = 1 N = 2 N = 3 N = 4 Figure 6: Example 1: Optimal λ at diﬀerent noise level. (Note that the vertical axes of the sub-ﬁgures arediﬀerent.)eter λ to the binary pattern b (32), we show the optimal λ as a function of the standard deviation σ of noiseusing diﬀerent binary patterns in Fig. 6. We deﬁne the optimal λ as the value minimizing the RMSE foreach ﬁxed binary pattern b .In this test, using the data in Fig. 2(a), we search for λ minimizing an average RMSE (20 trials) at eachnoise level generated by diﬀerent random seeds. We present the results of select M from 1 to 4 in Fig. 6(a)to (d), respectively. In each ﬁgure, we show the optimal λ as a function of σ under diﬀerent N . Note that,in Fig. 6(a), when M = 1 and N = 1, the proposed method adheres neither grouping nor periodic structure,in which case, the problem (14) coincides to the BPD problem with non-convex regularizer.Note that since we simulate the test signal with a unit standard deviation, the horizontal axis in Fig. 6is also the ratio of deviation from noise to data. In other word, we can use Fig. 6 as a look-up table, whenthe input noise-to-signal ratio (SNR) is known.Moreover, all the 16 curves in Fig. 6 show that the optimal λ varies approximately linearly with noiselevel. In practice, we suggest to chose λ in (14) proportional to the noise level as λ = rσ . Through furtherexperiments, we provide Table 3 as a guide for choosing the multiplier r . The above comparative evaluation of OGS, POGS, wavelet, and fast spectral kurtosis uses RMSE as anindicator of denoising performance. However, the RMSE by itself is not a suﬃcient indicator of faultdetection accuracy. In the following, we use a receiver operating characteristic (ROC) based approach toevaluate the relative accuracy of the methods. 13able 3: Selection of r for setting λ . (cid:72)(cid:72)(cid:72)(cid:72)(cid:72) M N False positive rate T r ue po s i t i v e r a t e POGS (AUC = 0.999) OGS (AUC = 0.921) Wavelet (AUC = 0.783) SK (AUC = 0.664)

Figure 7: Example 1: ROC curves of faults detection.Figure 8: Example of classiﬁcation rule.The receiver operating characteristic (ROC) curve is a well known detection performance evaluationmethodology [22]. ROC curves are well suited to the problem of vibration-based diagnosis applications [23].An ROC curve is generated by plotting the probability of a false alarm against the probability of detectionas the threshold level is varied. Since the POGS approach is focused on machinery fault detection, the ROCcurve is utilized to validate the superior detection performance of POGS compared to other methods. Wedeﬁne the classiﬁcation rule as: if one sample in a transient [generated by (33)] is detected as positive, thenthis entire transient a fault feature is detected and all the remaining samples are all assigned to be positive.This rule is slightly diﬀerent to the one used in [23], since using the sample-wise decision rule in [23] maycause problem of overweighting sample recovery, but neglecting detecting fault features as transients of a14able 4: The parameters of the tested bearingInner Race Outer Race Roller Number of roller Contact angle160 mm 290 mm 34 mm 17 0 ◦ Figure 9: Example 2: Fault on the outer race of the testing bearing.cluster of samples. We use a simple example in Fig. 8 to illustrate the problem and our classiﬁcation rule.Suppose that the true signal has 8 samples consists 2 periods (4 samples for each period), and each period has2 samples positive (labeled as circles). If Method 1 detects 2 samples belonging to two diﬀerent transientsrespectively, and method 2 detects 2 samples belonging to only one transients, then the rule of [23] will givethat the two methods have identical accuracy, because if merely counting the samples, they have a samenumber of samples detected. This is undesirable, since Method 2 misses an entire transient. To overcomethis issue, we re-label the detect result, where if any sample in a transient of each period is detected, were-label all the samples in the entire transient to be detected. In the example of Fig. 8, after the re-labeling,Method 1 has a better accuracy because it detects both of the transients.Fig. 7 shows the ROC curves from using OGS, POGS and wavelet-based method respectively. POGSachieves an almost perfect detection result. Also, OGS is better than wavelet-based method. In this example,because the results from POGS with diﬀerent parameter settings [in Fig. 3(a) and Fig. 3(b)] obtain almostidentical ROC curves, we chose to show the one in Fig. 3(a) where N = 4 and M = 4.We also use the extracted envelope from fast spectral kurtosis method to perform a similar ROC analysis.Note that although we plot it together with other ROC curves in Fig. 7, since its envelope has a diﬀerentlength than the other results, the ROC curve is generated by a diﬀerent number of samples. In this example, the proposed approach is applied to a vibration signal collected from a locomotive rollingelement bearing with defect. The testing locomotive rolling bearing with a slight scrape on its outer race isshown in Fig. 9. The vibration signal is measured from acceleration sensors, using SONY Ex data acquisition15 A m p li t ude Measured vibration signal(a) A m p li t ude N = 2, N = 219, M = 4, λ = 0. 076 A m p li t ude N = 4, N = 217, M = 4, λ = 0. 052 A m p li t ude Figure 10: Example 2: (a) Test data. (b) Result of proposed method with N = 2 , M = 4. (c) Result ofproposed method with N = 4 , M = 4. (d) Result of AR-MED.system when the electric locomotive was running. The bearing type is 552732QT and its speciﬁcation isshown in Table 4. The sampling rate is 12.8 kHz and the current rotating speed is approximately 481 rpm.Thus, based on the geometric parameters and rotational speed, the characteristic frequency of the outer racedefect is calculated to be f o = 57 . b (32) with M = 4 and N = 2, then by Table 3 we can chose λ with the respect to the noise level.Note that since the ‘noise’ in the vibration data for fault detection is the background out of the transientsequence, in practice it can be easily estimated using healthy data.In this example, we illustrate how to determine λ without healthy data. We estimate the deviation ofbackground by the formula ˆ σ = MAD ( y ) / . MAD in (34)stands for median absolute deviation, deﬁned as

MAD ( y ) := median ( | y n − median ( y ) | ) . (35)In this example, the estimated deviation of background is ˆ σ = 0 . λ can be obtained using λ = r ˆ σ where r = 0 .

475 (from Table 3). The result is shown in Fig. 10(b), wherea periodical phenomenon can be easily identiﬁed. Moreover, we also test POGS with M = 4 and N = 4,using the same method to set λ . The result is shown in Fig. 10(c).As a comparison, we also test the data with autoregression model assisted minimum entropy deconvolution(AR-MED) [25] . The order of AR ﬁlter is chosen by maximizing Kurtosis value as the implementationsuggested, and the estimated ﬁlter in the MED step has a length of 50. The result is shown in Fig. 10(d),where most of the impulses are promoted after deconvolution. However, since the baseline is still relativelynoisy, the regularity of periodicity is not very clear. Table 5: Fault frequencies for MFS motor bearingComponent FTF BPFO BPFI BSFMotor Bearings 0.384 3.066 4.932 2.03To further demonstrate the eﬀectiveness of the proposed method for machinery fault detection, appli-cations of motor bearing fault diagnosis are studied in this section. The experiment is performed on a Implementation available online A m p li t ude SQ raw data(a)0 500 1000 1500 2000 2500 300005001000150020002500 Frequency (Hz) S pe c t r u m m agn i t ude Fourier spectrum(b)0 100 200 300 400 500 600 700 80002004006008001000 Frequency (Hz)Hilbert envelope spectrum E n v e l ope m agn i t ude (c) Figure 12: Example 3: Test data. (a) The measured vibration signal of the motor housing. (b) Fourierspectrum. (c) Hilbert envelope spectrum.Spectra Quest’s machinery fault simulator (MFS) illustrated in Fig. 11. The user does not need to make anymodiﬁcations to the motor provided. The simulation setup consists of a motor with intentionally faultedbearings: one bearing with an inner race fault, and one bearing with an outer race fault. Therefore, the faultdiagnosis of the motor bearings is equivalent to a compound faults detection case. The motor bearing faultfrequencies for MFS components are given in Table 5. Accelerometers were mounted on the motor housing.The vibration signals were measured at a sampling rate f s = 6400 Hz. The rotating speed of the motor is1433 rpm (23.89 Hz). Hence, the characteristic frequencies of the inner race and outer race of the motorbearings are calculated to be f i ≈ . f o ≈ . T i and theouter race period T o , for the purpose of separating the useful impulsive fault features. In particular, keeping N = 2 and M = 4, we deﬁne two diﬀerent group structures using (31) and (32) based on the inner race18 A m p li t ude Extracted transients(a) N = 2, N = 85, M = 4, λ = 0. 787 S pe c t r u m m agn i t ude Fourier spectrum(b)0 100 200 300 400 500 600 700 8000100200300400500 Frequency (Hz) E n v e l ope m agn i t ude ↓ f o ≈ ↓ f o ↓ f o Hilbert envelope spectrum(c) Hilbert envelope spectrum Smoothed profile

Figure 13: Example 3: Outer race faults detection ( N = 2 , M = 4): (a) Extracted transients, (b) Fourierspectrum and (c) Hilbert envelope spectrum.period T i = 1 /f i and the outer race period T o = 1 /f o , respectively. We use the atan penalty function with a = 1 / ( K λ ) to ensure convexity of the objective function. The value of λ obtained using healthy data, isset so as to diminish the healthy data to almost all zeros.The two periodic-related values correspond to the outer race defect and the inner race defect frequenciesrespectively. Strong periodic impulses with intervals of approximately 0.0133 second (75 Hz) are clearlyrevealed Fig. 13(a), which is exactly in accordance with the outer race characteristic frequency of 73.2 Hz.Similarly, periodic transient features with the period 0.0085 second (118 Hz) can be observed in Fig. 14(a),which is approximately the inner race characteristic frequency of 117.8 Hz. To further reveal the characteristicfrequencies, the frequency spectrum and the Hilbert envelope spectrum of the processed signals are shownin Fig. 13(b) and (c) and Fig. 14(b) and (c). We also present the smoothed proﬁles of the Hilbert envelopspectrum to indicate the characteristic frequencies more clearly. The Hilbert spectrum of the processedresult illustrated in Fig. 13(c) is obtained using the proposed approach with prior knowledge of the outerrace characteristic frequency 73.2 Hz. Apparently, the characteristic frequencies of outer race 73 Hz andits harmonic components are clearly revealed, as shown in Fig. 13(c). Similarly, Fig. 14(c) is obtainedutilizing the proposed approach with prior knowledge of the outer race characteristic frequency 117.8 Hz.The characteristic frequencies of inner race 117 Hz and its harmonic components can be clearly observedin Fig. 14(c). Thus, the proposed periodic non-convex regularized OGS approach successfully detects thecompound faults of the motor bearings. More speciﬁcally, the fault features of outer race defect and innerrace defect are clearly separated utilizing the proposed approach.19 A m p li t ude Extracted transients(a) N = 2, N = 52, M = 4, λ = 0. 787 S pe c t r u m m agn i t ude Fourier spectrum(b)0 100 200 300 400 500 600 700 8000100200300400500 Frequency (Hz) E n v e l ope m agn i t ude ↓ f i ≈ ↓ f i ↓ f i Hilbert envelope spectrum(c) Hilbert envelope spectrum Smoothed profile

Figure 14: Example 3: Inner race faults detection ( N = 2 , M = 4): (a) Extracted transients, (b) Fourierspectrum and (c) Hilbert envelope spectrum. Time (second) E n v e l op a m p li t ude c = 2050 Hz)(b) Threshold

Figure 15: Example 3: Results of fast spectral kurtosis. (a) Kurtogram. (b) Envelope of extractedtransients. 20o further demonstrate the eﬀectiveness of the proposed approach, we also processed the vibration signalusing spectral kurtosis and the results are presented in Fig. 15. Fig. 15(a) is the fast kurtogram, wherethe optimal carrier is detected at 2050 Hz. Under this frequency, periodic transients can be observed inthe envelope of the ﬁltered signal, in accordance with the rotating speed of motor (23.89 Hz), as shown inFig.15(b). No further fault-related information can be observed in Fig. 15(b).

This paper proposes a periodic group sparsity approach for the purpose of detecting faults in rotatingmachinery. The approach uses non-convex penalty functions to promote periodic group sparsity. We showhow to constrain the non-convex penalty functions to ensure that the objective function is convex. TheOGS method was introduced in [9] and extended to non-convex regularized OGS in [10]. A novelty of theproposed approach is that the proposed penalty function models the periodicity of the sparse groups, makingit suitable speciﬁcally for feature extraction in machinery fault diagnosis. The period of the sparse pulses ischosen based on prior knowledge of machine geometry under inspect. Moreover, the proposed approach isable to separate compound fault features by utilizing diﬀerent periods of the periodic pulses correspondingto diﬀerent fault frequencies (e.g., outer race and inner race characteristic frequencies of rolling elementbearings) as demonstrated in Section 5. The eﬀectiveness of the proposed approach is veriﬁed by simulationand experimental data. The processed results demonstrate that the proposed approach outperforms othermethods.

AppendicesA Proof of Proposition 1

Proof.

This proposition can be proven by taking the second-order derivative, where φ ( v ; a ) is twice diﬀer-entiable when v (cid:54) = 0. The second-order derivative of (12) is p (cid:48)(cid:48) ( v ) = γ + λφ (cid:48)(cid:48) ( v ; a ) , v (cid:54) = 0 . (36)Therefore, when v (cid:54) = 0, it is suﬃcient that, if γ + λφ (cid:48)(cid:48) ( v ; a ) >

0, then p (cid:48)(cid:48) ( v ) >

0. When v = 0, Lemma Ain [10] can be utilized directly, to show that, since p (cid:48) (0 − ) < p (cid:48) (0 + ), under the condition (13), function p in(12) is strictly convex on R . B Proof of Proposition 2

Proof.