[PDF] Neural Architecture Search For Fault Diagnosis

Abstract

Data-driven methods have made great progress in fault diagnosis, especially deep learning method. Deep learning is suitable for processing big data, and has a strong feature extraction ability to realize end-to-end fault diagnosis systems. However, designing neural network architecture requires rich professional knowledge and debugging experience, and a lot of experiments are needed to screen models and hyperparameters, increasing the difficulty of developing deep learning models. Frortunately, neural architecture search (NAS) is developing rapidly, and is becoming one of the next directions for deep learning. In this paper, we proposed a NAS method for fault diagnosis using reinforcement learning. A recurrent neural network is used as an agent to generate network architecture. The accuracy of the generated network on the validation dataset is fed back to the agent as a reward, and the parameters of the agent are updated through the strategy gradient algorithm. We use PHM 2009 Data Challenge gearbox dataset to prove the effectiveness of proposed method, and obtain state-of-the-art results compared with other artificial designed network structures. To author's best knowledge, it's the first time that NAS has been applied in fault diagnosis.

Full PDF

FFebruary 20, 2020 5:55 RPS/Trim Size: 221mm x 173mm for Proceedings/Edited Book esrel2020psam15-paper

Neural Architecture Search For Fault Diagnosis

Xudong Li

National Space Science Center.CAS, University of Chinese Academy of Sciences, Beijing,China.E-mail: [email protected]

Yang Hu

Science and Technology on Complex Aviation System Simulation Laboratory, 9236 mailbox, Beijing, China.E-mail: [email protected]

Jianhua Zheng

National Space Science Center.CAS, University of Chinese Academy of Sciences, Beijing,China.E-mail: [email protected]

Mingtao Li

National Space Science Center.CAS, University of Chinese Academy of Sciences, Beijing,China.E-mail: [email protected]

Data-driven methods has made great progress in fault diagnosis, especially deep learning method. Deep learningis suitable for processing big data, and has a strong feature extraction ability to realize end-to-end fault diagnosissystems. Due to the complexity and variability of experimental data, some deep learning models are designed tobe complex. However, designing neural network architecture requires rich professional knowledge and debuggingexperience, and a lot of experiments are needed to screen models and hyperparameters, increasing the difﬁcultyof developing deep learning models. Fortunately, neural architecture search (NAS) is developing rapidly, and isbecoming one of the next directions for deep learning. Given a search space, NAS can automatically search for theoptimal network architecture. In this paper, we proposed a NAS method for fault diagnosis using reinforcementlearning. A recurrent neural network is used as an agent to generate the network architecture. The accuracy of thegenerated network on the validation dataset is fed back to the agent as a reward, and the parameters of the agent areupdated through the strategy gradient algorithm. In order to speed up the search process, parameters sharing methodis adopted in this paper. We use PHM 2009 Data Challeng gearbox dataset to prove the effectiveness of propsedmethod, and obtain state-of-the-art results compared with other artiﬁcial designed network structures. To authorsbest knowledge, its the ﬁrst time that NAS has been applied in fault diagnosis.

Keywords : Fault diagnosis, Deep learning, Architecture search, Reinforcement learning, Strategy gradient, Searchspace.

1. Introduction

In recent years, deep learning has been success-fully applied in fault diagnosis, and it has becomea new research hotspot in data-driven methodsHoang and Kang (2019); Wang et al. (2018); Zhaoet al. (2019). Deep learning has a strong ability toextract features and it’s easy and effective to es-tablish an end-to-end fault diagnosis system Khanand Yairi (2018). Deep learning methods suchas Convolutional Neural Network (CNN) Li et al.(2019); Abdeljaber et al. (2018); Guo et al. (2018);Li et al. (2020), Recurrent Neural Network (RNN)Lei et al. (2019); Liu et al. (2018), Auto-Encoder(AE) Shao et al. (2018); Yu (2019) and CapsuleNetwork (CN)Chen et al. (2019); Zhu et al. (2019) has proven effective on multiple problems. How-ever, though deep learning system is powerfuland easy to build, designing neural network ar-chitecture needs rich professional knowledge anddebugging experience. To obtain an optimal ar-chitecture, a lot of experiments are needed, whichleads to the time-consuming development of deeplearning systems. What we want is an automatedmachine learning (AutoML) system that can auto-matically design neural network and adjust hype-parameters.Fortunately, as a branch of AutoML, neuralarchitecture search (NAS) is developing rapidly,and has become a new direction for deep learn-ingElshawi et al. (2019); Elsken et al. (2018);Z¨oller and Huber (2019). The general process of

Proceedings of the 30th European Safety and Reliability Conference andthe 15th Probabilistic Safety Assessment and Management Conference.

Edited by

Piero Baraldi, Francesco Di Maio and Enrico ZioCopyright c (cid:13)

Published by

Research Publishing, SingaporeISBN: 981-973-0000-00-0 :: doi: 10.3850/981-973-0000-00-0 esrel2020psam15-paper a r X i v : . [ c s . L G ] F e b ebruary 20, 2020 5:55 RPS/Trim Size: 221mm x 173mm for Proceedings/Edited Book esrel2020psam15-paper Search spaceSearch strategyEstimation strategy

Searched architecture Evaluationresults

Optimal architecture

Fig. 1. General process of neural architecture search.

In this paper, we propose a NAS method forfault diagnosis using reinforcement learning. ARNN is used as an agent (controller) to generatearchitectures that are trained with training dataset.And these architectures are evaluated with valida-tion dataset to get accuracy. The accuracy is seenas a reward to controller, and the parameters ofcontroller are updated using strategy gradient al-gorithm. We also utilize parameters sharing trickto accelerate the search process. The proposedmethod is proved to be effective on PHM 2009Data Challenge gearbox dataset. Our contribu-tions can be summarized into following aspects. • We applied NAS in fault diagnosis for the ﬁrsttime. A reinforcement learning based NASframework is developed to search for the opti-mal architecture. • We put forward several problems and chal-lenges in the application of NAS and AutoMLin fault diagnosis, and point out several direc-tions for future research.

2. Related WorkFault diagnosis using deep learning . Deeplearning has been widely applied in fault diag-nosisAbdeljaber et al. (2018); Guo et al. (2018);Lei et al. (2019); Li et al. (2019, 2020); Shaoet al. (2018); Yu (2019), and recently some novelnetwork structures are proposed. Zhu et al. (2019)proposes a novel capsule network with an Incep-tion block for fault diagnosis. First signals aretransformed into a time-frequency graph, and twoconvolution layers are applied to preliminarily ex-tract features. Then an inception block is applied to improvethe nonlinearity of the capsule. Afterdynamic routing, the lengths of the capsules areused to classify the fault category. In order to ob-tain diversity resolution expressions of signals infrequency domain, Huang et al. (2019) proposesa new CNN structure named multi-scale cascadeconvolutional neural network (MC-CNN). MC-CNN uses the ﬁlters with different scales to ex-tract more useful information. To solve the prob-lem that proper selection of features requires ex-pertise knowledge and is time-consuming, Panet al. (2017) proposes a novel network named Lift-ingNet to learn features adaptively without priorknowledge. LiftingNet introduced split layer, pre-dict layer and update layer. And different kernelsizes are applied to improve learning ability.

Neural architecture search.

The ﬁrst inﬂu-ential job of NAS is Zoph and Le (2016). Inthis paper, author uses a RNN to generate thedescriptions of neural networks, and train theRNN with reinforcement learning to maximizetheir excepted accuracy on validation dataset. Theproposed method not only generate CNN, butalso generate Long Short-Term Memory network(LSTM) cell. Pham et al. (2018) proposes a fastand inexpensive method named Efﬁcient NeuralArchitecture Search (ENAS). This approach usessharing parameters among child models to greatlyreduce search time than above standard NAS.Brock et al. (2017) employs an auxiliary Hyper-Net to generates the weights of a main modelwith variable architectures. And a ﬂexible schemebased on memory read-writes is developed todeﬁne a diverse range of architectures. Unlikeabove approaches searching on a discrete and non-differentiable search space, Liu et al. (2018) pro-poses a differentiable architecture search methodnamed DARTS. This approach uses gradient de-scent to search architectures by relaxing the searchspace to be continuous.

3. Methods

According to Zoph and Le (2016), neural networkcan be typically speciﬁed by a variable-lengthstring, so it can be generated by RNN. In thissection, we will use a RNN as controller to gen-erate a CNN with reinforcement learning. Given asearch space, CNN can be designed by RNN, andRNN is trained with a policy gradient method tomaximize the expected accuracy of the generatedarchitectures.

Search Space

Our method searches for the optimal convolutionkernel combination in a ﬁxed network structure.Several typical network structure can be selectedsuch as Inception structure, ResNet structure,DenseNet structure and so on. In this paper, wesearch the optimal architecture in a ResNet struc-ture which is shown in Figure 3. The inputs are ebruary 20, 2020 5:55 RPS/Trim Size: 221mm x 173mm for Proceedings/Edited Book esrel2020psam15-paper • × kernel with dilation rate d = 1 • × kernel with dilation rate d = 2 • × kernel with dilation rate d = 3 • × kernel with dilation rate d = 1 • × kernel with dilation rate d = 2 • × kernel with dilation rate d = 3 Dilated convolution is to inject holes into thestandard convolution kernel to increase the recep-tive ﬁeld Yu and Koltun (2015). Compared withthe standard convolution operation, the dilatedconvolution has one more hyperparameter calleddilation rate d , which refers to the number of ker-nel intervals. An example of dilated convolutioncompared with standard convolution is shown inFigure 2. Input dataFeature map(a) Standard Convolution with a 3 x 1 kernel, stride=1, dilation rate=1Input dataFeature map(b) Dilated Convolution with a 3 x 1 kernel, stride=1, dilation rate=2

Fig. 2. An example dilated convolution compared with stan-dard convolution.

In this paper, we set 4 blocks in the ResNetstructure, each layer has 6 different convolutionkernels to choice, so there are (4 × ≈ . × possible architectures. Our aim is to search theoptimal architecture in such a large search space. Designing CNN using RecurrentNeural Network

Since a neural network can be encoded by avariable-length string, it’s possible to use RNN,a controller to generate such string. Here, sixdifferent convolution kernels are encoded as Num-bers ∼ , so different combinations of Num-bers represent different network architectures. Inthis paper, we use LSTM to generate such Num-bers combinations, as shown in Figure 3. For LSTM, the output probability distribution of sixconvolution kernels is obtained by softmax , and acertain kernel is sampled form such distribution.For example, for the ﬁrst layer of CNN, the con-troller outputs a softmax probability distribution: [0 . , . , . , . , . , . . And the probability ofthe fourth convolution kernel being sampled is0.3, and it is most likely to be sampled. Thenthis sampled convolution kernel is the convolutionoperation of the ﬁrst CNN layer. Next, the em-bedding of sampled Number is used as input tothe LSTM to generate the convolution kernel ofthe next layer. And so on, until the convolutionkernels of all layers are generated. OPLSTM OPLSTM OPLSTM … OPLSTMLayer 1 Layer 2 Layer 3 Layer L s t e m L a y e r L a y e r L a y e r L a y e r … L a y e r L - L a y e r L G l ob a l A vg P oo l C l a ss i f i e r ResidualBlock

ControllerGenerated Architecture

Fig. 3. An example run of designing CNN using RNN con-troller.

Training With ReinforcementLearning

In reinforcement learning, there are two mainparts: agent and environment . Agent gets re-wards by interacting with the environment to learnthe corresponding strategies. In reinforcementlearning based NAS, agent is the RNN controller,environment is the search space, the validation ac-curacy of sampled model is reward. The generatedCNN architecture by controller is trained usingtraining dataset D train , and this CNN is evaluatedusing validation dataset D val to get reward R .Then controller is updated using the reward. Toﬁnd optimal architecture, we need to maximizethe expected reward of controller: J ( θ c ) = E P ( a L ; θ c ) [ R ] (1) ebruary 20, 2020 5:55 RPS/Trim Size: 221mm x 173mm for Proceedings/Edited Book esrel2020psam15-paper θ c is the parameters of controller, a L isa list of convolution kernels sampled by controllerto generate a CNN, P ( a L ; θ c ) is the probabil-ity that a L is sampled. But the reward signalis not differentiable, we use the policy gradientalgorithm to iteratively update J Williams (1992): ∇ θ c J ( θ c ) = L (cid:88) l =1 E P ( a L ; θ c ) G θ c G θ c = ∇ θ c log P (cid:0) a l | a ( l − ; θ c (cid:1) R (2)An empirical approximation of the above quan-tity is: m m (cid:88) k =1 L (cid:88) l =1 ∇ θ c log P (cid:0) a l | a ( l − ; θ c (cid:1) R k (3)Where m is the number of different architec-tures that the controller samples in one batch, L isthe number of convolution kernels our controllerhas to predict in each CNN, and R k is the rewardof k -th sampled architecture. Above updatingrule is an unbiased estimate and has a very highvariance. In order to reduce the variance of thisestimate, we use a baseline function to this updat-ing rule Zoph and Le (2016): m m (cid:88) k =1 L (cid:88) l =1 ∇ θ c log P (cid:0) a l | a ( l − ; θ c (cid:1) ( R k − b ) (4)Where b is an exponential moving average ofthe previous architecture validation accuracies. Accelerate Training usingParameters Sharing

As we all know, training a neural network fromscratch is time-consuming. In the process ofsearch, a sampled architecture need to be trainedfrom scratch to obtain it’s reward. This canbe very time-consuming and inefﬁcient when thenumber of search epochs is particularly large. Toreduce the cost of searching, the weight sharingmechanism is applied in training process. Wedon’t train the sampled architecture form scratch,but train the model with only one mini-batch data,and the trained convolution kernels will be reusedin next search epoch Pham et al. (2018). There aremany repeated convolution operations among ar-chitectures, and weight sharing can prevent themfrom being repeatedly trained. This greatly im-proves the efﬁciency of search process.

Neural Architecture Search Pipeline

In each search epoch, RNN will generate a num-ber of architectures according to the output prob-ability distribution. These architectures will be

Algorithm 1

NAS for fault diagnosis.

Input:

Search space S ( n, L ) with n choicesper layer and L layers in total, search epochs N , controller train steps N c in each searchepoch, number of sampled architectures m and M , parameters of candidate convolution ker-nels θ kernel ( n, L ) , parameters of controller θ c ,training data D train , validation data D val . Initialize θ kernel ( n, L ) and θ c . for i = 1 to N dofor x, y in D train do Sample an architecture A with convolutionkernels θ kernel ( A ) , train it using x, y .Save the trained convolution kernels θ kernel ( A ) . end forfor j to N c do Sample m architectures, their rewards R m are obtained by D val .Update controller according to Eq. (4). end forend for Sample M architectures, get their rewards R M . Find the architecture A with highestreward R max , train it from scratch. Algorithm 2

Sample architecture using LSTM.

Input:

Input size of LSTM I , hidden size ofLSTM H , number of LSTM layers L , numberof convolution kernels in each CNN layer N . Initialize controller

LST M ( I, H, L, N ) .Get the embedding of ﬁrst convolution kernel inCNN w for i = 1 to N do Use embedding w i − as the input of LST M ,get the output probability distribution P i .Get the number of a convolution kernel a i based on probability sampling P i .Get the embedding w i of sampled number a i . end for Obtain the sampled architecture A = { a , a , · · · , a N } .trained with signal mini-batch training data, andtheir rewards are obtained using validation data.Then the controller is update according to Eq. (4).Above search process is then repeated until themaximum number of search epochs is reached.Finally, M architectures are generated by thetrained controller, and the architecture with thehighest validation accuracy is selected as the ﬁnalarchitecture, and trained from scratch. The wholeprocess of neural architecture search for fault di-agnosis is shown is Algorithm 1 and Algorithm 2. ebruary 20, 2020 5:55 RPS/Trim Size: 221mm x 173mm for Proceedings/Edited Book esrel2020psam15-paper

4. Experiments4.1.

PHM-2009 Dataset

In this paper, we use gearbox dataset of PHM-2009 Data Challenge to study NAS for fault di-agnosis. This dataset is a typical industrial gear-box data which contains 3 shafts, 4 gears and6 bearings. Two sets of gears, spur gears andhelical gears are tested. There are six labels inthis dataset. For each label, there are 5 kinds ofshaft speed, 30, 35, 40, 45 and 50 Hz, and twomodes of loading, high and low loading. We donot distinguish between these working conditionsunder each label. The raw vibration signals of thisdataset are very long, so we use a sliding windowwith a length of 1000 and a step length of 50 tosegment the signals to obtain training, validationand testing samples. These signals are normalizedto [ − , . Finally, we obtain 22967 trainingsamples, 2552 samples and 6380 samples. Anvibration signals example of six labels is shownFigure 4. Fig. 4. An vibration signals example of six labels.

Training Details

For the controller, we set the input size I andhidden size H of LSTM to be 64, number oflayer to be 1. We use Stochastic Gradient De-scent (SGD) with learning rate of 0.01 to trainthe controller. In each search epoch, we trainthe controller for N c = 5 steps. In each trainstep, we sample m = 20 architectures. For thesampled architecture training, we use Adam withlearning rate of 0.001 and L2 regularization of e − . For the ResNet structure, we set 4 residualblocks and each block contains 2 layers. Eachlayer is followed by a down-sampling layer witha convolution kernel of × and s step size of 2. The number of channels in the ﬁrst block is8, then doubles as it passes through the down-sampling layer. We set search epochs N = 200 ,batch size to be 128. After the search, M = 100 architectures are sampled to be evaluated, and thearchitecture with highest validation accuracy isfound as the ﬁnal model. The ﬁnal model willbe trained from scratch using Adam with learningrate of 0.001 and batch size of 128. The ﬁnalresult of searched model is evaluated on testingdataset. In the above process, learning rate isadjusted using CosineAnnealingLR. The code isimplemented using PyTorch 1.3, using a signalTesla K80 GPU. The whole search time is 1.6hours. Results

Table 1 summarizes the results of NAS and sixmanually designed models. All layers of M1model are the ﬁrst convolution kernel in the searchspace, M2 is the second, and so on. Note thatour method is searching in the ResNet structure,so all compared models are variants of ResNet.We can also search the architectures in Inceptionstructure, DenseNet structure and so on.The searched architecture is shown in Figure 5,and it achieved accuracy of 78.91%. Six manuallydesigned models achieved at most accuracy of76.22%. This indicates that the method of NASbased on reinforcement learning is effective, andthe controller gets rewards through the sampledmodels and constantly update the parameters inthe direction of obtaining more excellent models.In addition, after each search epoch, 50 architec-tures were sampled to get their validation accu-racy. Figure 6 shows the trends of those accuracyrates throughout the search process. We can seethat accuracies increase gradually. It indicatesthat the repeated use of convolution kernels iseffective, and it does improve the performanceand stability of the entire ResNet structure toaccurately evaluate the performance of sampledarchitectures.

Table 1. Testing accuracy comparasion of searchedmodel and hand-designed model.Model Architecture Accuracy (%)M1 k=3, d=1 68.43M2 k=5, d=1 71.19M3 k=3, d=2 69.37M4 k=5, d=2 72.46M5 k=3, d=3 72.77M6 k=5, d=3 76.22Searched / ebruary 20, 2020 5:55 RPS/Trim Size: 221mm x 173mm for Proceedings/Edited Book esrel2020psam15-paper s t e m k = , d = = , d = = , d = = , d = = , d = = , d = G l ob a l A vg P oo l C l a ss i f i e r k = , d = = , d = Fig. 5. The architecture of searched model. A cc u r ac y Epoch

Fig. 6. Validation accuracy of training process, and 50 mod-els are sampled in each search epoch.

5. Discussions

In this paper, we have initially shown the appli-cation of NAS in fault diagnosis and proved itseffectiveness. However, the application of NAS infault diagnosis has just started, and there are stillmany challenges to realize the automatic design ofdeep learning models for fault diagnosis. We havesummarized the following problems to be solved: • In this paper, we just search the optimal ar-chitecture in ResNet structure, which has greatlimitations. We are more interested in how toautomatically design more novel and complexstructures, not limited by the existing structures or the number of layers, and the searched mod-els have better performance. It is currently themost challenging problem. • Reinforcement learning based NAS is also aproxy NAS that will cost a lot of time. Inthis paper, due to the small dataset, the smallnetwork size, and the small number of trainingepochs, the entire search took only 1.6 hours, ofwhich the time of training controller accountsfor a large part. How to use more effectivesearch methods for fault diagnosis is the prob-lem that needs to be solved urgently. • In this paper, we only evaluate the testing accu-racy of sampled architectures, but did not focuson the amount of parameters of the searched ar-chitectures (which determines the storage spaceoccupied by the model) and the amount ofcalculation (which determines the speed of themodel). When the model is deployed in anembedded terminal, the amount of parametersand calculations become very important. Howto search for a model with a small number ofparameters and a small amount of calculationbut with high accuracy is also a difﬁcult prob-lem for future research. • Not only limited to neural architecture search,realizing the automation of machine learningin fault diagnosis is a wider and more difﬁ-cult problem. The data of industrial equipmentis huge and complicated, the preprocessing ofdata is difﬁcult, and the working conditions arechanging. From data collection to data pre-processing, to feature engineering and model-ing, to model testing and tuning parameters, theentire development process cycle is long andtime consuming. Automating machine learningin the ﬁeld of PHM is a more difﬁcult challenge.

6. Conclusions

In this paper, we develop a method of neuralarchitecture search for fault diagnosis. It’s theﬁrst time that NAS technology has been used toautomatically generate deep learning model forfault diagnosis. We use RNN as a controller togenerate architectures in ResNet search space, andtrain the controller with reinforcement learning.Results show that NAS is effective to ﬁnd a modelwith better performance than manually designedmodels.

Acknowledgement

The work of Yang Hu is supported by the Na-tional Nature Science Foundation of China, grantnumber 61703431. The work of Mingtao Liwas supported by the Youth Innovation Promo-tion Association CAS. The computing platformis provided by the STARNET cloud platform ofNational Space Science Center Public TechnologyService Center. ebruary 20, 2020 5:55 RPS/Trim Size: 221mm x 173mm for Proceedings/Edited Book esrel2020psam15-paper References

Abdeljaber, O., O. Avci, M. S. Kiranyaz,B. Boashash, H. Sodano, and D. J. Inman(2018). 1-d cnns for structural damage detec-tion: veriﬁcation on a structural health moni-toring benchmark data.

Neurocomputing 275 ,1308–1317.Brock, A., T. Lim, J. M. Ritchie, and N. Weston(2017). Smash: one-shot model architecturesearch through hypernetworks. arXiv preprintarXiv:1708.05344 .Chen, T., Z. Wang, X. Yang, and K. Jiang (2019).A deep capsule neural network with stochasticdelta rule for bearing fault diagnosis on rawvibration signals.

Measurement 148 , 106857.Elshawi, R., M. Maher, and S. Sakr (2019).Automated machine learning: State-of-the-art and open challenges. arXiv preprintarXiv:1906.02287 .Elsken, T., J. H. Metzen, and F. Hutter (2018).Neural architecture search: A survey. arXivpreprint arXiv:1808.05377 .Guo, L., Y. Lei, S. Xing, T. Yan, and N. Li (2018).Deep convolutional transfer learning network:A new method for intelligent fault diagnosis ofmachines with unlabeled data.

IEEE Trans-actions on Industrial Electronics 66 (9), 7316–7325.Hoang, D.-T. and H.-J. Kang (2019). A surveyon deep learning based bearing fault diagnosis.

Neurocomputing 335 , 327–335.Huang, W., J. Cheng, Y. Yang, and G. Guo (2019).An improved deep convolutional neural net-work with multi-scale information for bearingfault diagnosis.

Neurocomputing 359 , 77–92.Khan, S. and T. Yairi (2018). A review on theapplication of deep learning in system healthmanagement.

Mechanical Systems and SignalProcessing 107 , 241–265.Lei, J., C. Liu, and D. Jiang (2019). Fault di-agnosis of wind turbine based on long short-term memory networks.

Renewable energy 133 ,422–432.Li, X., Y. Hu, M. Li, and J. Zheng (2020). Faultdiagnostics between different type of compo-nents: A transfer learning approach.

AppliedSoft Computing 86 , 105950.Li, X., W. Zhang, and Q. Ding (2019). Un-derstanding and improving deep learning-basedrolling bearing fault diagnosis with attentionmechanism.

Signal Processing 161 , 136–154.Liu, H., K. Simonyan, and Y. Yang (2018).Darts: Differentiable architecture search. arXivpreprint arXiv:1806.09055 .Liu, H., J. Zhou, Y. Zheng, W. Jiang, and Y. Zhang(2018). Fault diagnosis of rolling bearings withrecurrent neural network-based autoencoders.

ISA transactions 77 , 167–178.Pan, J., Y. Zi, J. Chen, Z. Zhou, and B. Wang(2017). Liftingnet: A novel deep learn- ing network with layerwise feature learningfrom noisy mechanical data for fault classiﬁca-tion.

IEEE Transactions on Industrial Electron-ics 65 (6), 4973–4982.Pham, H., M. Y. Guan, B. Zoph, Q. V. Le, andJ. Dean (2018). Efﬁcient neural architecturesearch via parameter sharing. arXiv preprintarXiv:1802.03268 .Shao, H., H. Jiang, Y. Lin, and X. Li (2018).A novel method for intelligent fault diagnosisof rolling bearings using ensemble deep auto-encoders.

Mechanical Systems and Signal Pro-cessing 102 , 278–297.Wang, J., Y. Ma, L. Zhang, R. X. Gao, and D. Wu(2018). Deep learning for smart manufacturing:Methods and applications.

Journal of Manufac-turing Systems 48 , 144–156.Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist rein-forcement learning.

Machine learning 8 (3-4),229–256.Yu, F. and V. Koltun (2015). Multi-scale con-text aggregation by dilated convolutions. arXivpreprint arXiv:1511.07122 .Yu, J. (2019). A selective deep stacked denoisingautoencoders ensemble with negative correla-tion learning for gearbox fault diagnosis.

Com-puters in Industry 108 , 62–72.Zhao, R., R. Yan, Z. Chen, K. Mao, P. Wang,and R. X. Gao (2019). Deep learning and itsapplications to machine health monitoring.

Me-chanical Systems and Signal Processing 115 ,213–237.Zhu, Z., G. Peng, Y. Chen, and H. Gao (2019).A convolutional neural network based on acapsule network with strong generalization forbearing fault diagnosis.

Neurocomputing 323 ,62–75.Z¨oller, M.-A. and M. F. Huber (2019). Surveyon automated machine learning. arXiv preprintarXiv:1904.12054 .Zoph, B. and Q. V. Le (2016). Neural architec-ture search with reinforcement learning. arXivpreprint arXiv:1611.01578arXivpreprint arXiv:1611.01578