A deep-learning classifier for cardiac arrhythmias
AA deep–learning classifier for cardiac arrhythmias
Carla Sofia Carvalho
Hitachi Vantara
Lisbon, [email protected]
Abstract —We report on a method that classifies heart beatsaccording to a set of 13 classes, including cardiac arrhythmias.The method localises the QRS peak complex to define each heartbeat and uses a neural network to infer the patterns characteristicof each heart beat class. The best performing neural networkcontains six one–dimensional convolutional layers and four denselayers, with the kernel sizes being multiples of the characteristicscale of the problem, thus resulting a computationally fast andphysically motivated neural network. For the same numberof heart beat classes, our method yields better results with aconsiderably smaller neural network than previously publishedmethods, which renders our method competitive for deploymentin an internet–of–things solution.
Index Terms —Cardiac arrhythmias, electrocardiograms, con-volutional neural networks.
I. I
NTRODUCTION
An industry domain that inherently produces data is thehealth domain, in particular the subdomain related to themonitoring of patients. Since cardiovascular diseases are thefirst cause of death worldwide , a common monitoring is thatof the heart as a way to identify or prevent heart dysfunctions.Heart dysfunctions are related to anomalies in the heart’selectrical activity, including cardiac arrhythmias, and can bediagnosed in electrocardiograms (ECG), produced in real timeby portable devices. These records show the heart’s beatingpatterns in time as a result of differences in the electricalpotential in the heart.The interest thus lies in producing a physically motivatedmethod that classifies heart beats from medically annotatedECG records in a fast and robust way, so that it can bedeployed to generate alerts. Our suggested method encom-passes the processing of ECG records, based on the locationof characteristic features, and a classification model, based ona deep neural network, with the medical annotations providingthe labels.The advantage of neural networks over a heuristic model isthat they search for the optimal combination of weights overdifferent layers in sequence, which add non–linearities and canreproduce different functional forms. Neural networks havebeen used in the past to classify cardiac arrhythmias, usingdifferent number of heart beat types (e.g. Refs. [1]–[3]) andadopting complex architectures with difficult interpretability(e.g. Ref. [4]). In this paper, we test different architecturesfrom previously published work [1], [6] and create new neuralnetworks based on the scale of the problem, with view towards Chopped signal: record=101, chann=1, ilen=10, len=256 smooth(x_arr)0.0 0.2 0.4 0.6 0.8 1.0101 smooth(x_arr)d(smooth(x_arr))/dt0 25 50 75 100 125 150 1750.00.51.0 ps(smooth(x_arr))0.0 0.2 0.4 0.6 0.8 1.00.150.10 Label=N
Chopped signal: record=101, chann=2, ilen=10, len=256 smooth(x_arr)0.0 0.2 0.4 0.6 0.8 1.001 smooth(x_arr)d(smooth(x_arr))/dt0 25 50 75 100 125 150 1750.00.51.0 ps(smooth(x_arr))
Figure 1.
Signals in record 101.
Chopped smoothed signals and theirsubsequent discrete first derivative and amplitude spectrum. Top three panels:Channel 1. Bottom three panels: Channel 2. increasing the performance while simultaneously keeping theneural networks short and fast.II. D
ATA PROCESSING
A. Data digitisation
We use publicly available data from the MIT–BIH Ar-rhythmia Database Directory, comprising 48 records . Theserecords contain signals from two ECG channels (an uppersignal and a lower signal) sampled at a frequency f s =1 / dt = 360 Hz for N × dt = 30 min . These records alsocontain annotations by two cardiologists. Some records containpaced beats driven by a pacemaker or artifacts. We chooseto use all 48 records, since the paced beats can work as anadditional heart beat type and the artifacts can work as a noisecomponent. https://physionet.org/physiobank/database/html/mitdbdir/mitdbdir.htm a r X i v : . [ q - b i o . Q M ] N ov e also use the WFDB software package to read andprocess the file format that the records are encoded in. B. Heart beat locations
The signals consist of readings of the heart potential intime. The heart potential contains characteristic peaks, namelythe P peak, the QRS peak complex and the T peak, whichcorrespond to the polarisation/depolarisation heart cycle. Thefirst step consists in identifying the QRS peak complexes intime, which are usually more prominent in the upper signal.The annotations are located at the QRS peak complexand provide the labels to the heart beats. Hence the nextstep consists in chopping the signals about each QRS peakcomplex so that each fraction contains an individual heartbeat (Fig. 1). Each record has a characteristic beat lengthbetween consecutive QRS peak complexes. We choose themedian characteristic beat length ( len = 256 ) so that theresulting chopped signals can be concatenated into a matrix x ijk , where i ∈ { , ..., n sample } is the number of resultingchopped signals, j ∈ { , ..., len } is the length in time of eachchopped signal and k ∈ { sign1 , sign2 } indicates the signal. C. Heart beat annotations
The possible values of the annotations define theset of heart beat classes. The records in the MIT–BIH Arrhythmia Database Directory follow the annota-tion system such that beat annotations take the values { N , L , R , B , a , J , A , S , j , e , n , V , r , E , F , /, f , Q , ? } and non–beatannotations take all the other possible values .We produce the distribution of heart beat classes acrossthe records (Fig. 2, top panel). We observe that the classes { B , n , r , ? } are not represented in this data set. We also observethat the beat classes are not all equally represented. Hence weset an upper bound to the number of occurrences per heart beatclass (here n row max = 4000 ), so that all beats from under–represented classes are included but only a fraction of the beatsfrom over–represented classes is included. We also observethat the classes { S , E } contain less than six elements, whichis the minimum number of elements required to balance therepresentation of a given class (Sec.III). Note that some valuesdo not correspond to heart beat classes, e.g. { Q } corresponds tounclassifiable beats and { /, f } corresponds to paced beats; wechoose to keep them to add robustness to the model. Hencethe heart beat classes that can be classified are reduced to { N , L , R , a , J , A , j , e , V , F , Q , /, f } , totalling 13 classes.We encode the annotation values into binary vectors withlength equal to the number of classes, resulting in a matrix y ic , where i ∈ { , ..., n sample } is the number of heart beats and c ∈ { , ..., n class } is the number of heart beat classes.III. D ATA ENGINEERING
A. Data resampling
Since the heart beat classes are not all equally represented,we re–sample the data by generating synthetic beats belonging https://wfdb.readthedocs.io/en/latest/ https://archive.physionet.org/physiobank/annotations.shtml ! " + / A E F J L N Q R S V [ ] a e f j x | ~010000200003000040000500006000070000 Number of occurrences per heart beat class (all data) original labels / A F J L N Q R V a e f j05001000150020002500300035004000
Number of occurrences per heart beat class(nrow_max=4000) original labelspredicted labels
Figure 2.
Distribution of the heart beat classes.
Top panel: Distributionof the original labels. Bottom panel: Distribution of both the original and thepredicted labels (best method), to the under–represented classes, thus producing a new dataset with balanced classes. We use the Synthetic MinorityOversampling Technique (SMOTE) as implemented in theimbalanced–learn package . When we resample data, thetraining set undergoes resampling, whereas the test set doesnot, so that the performance metrics refer to the original classdistribution. B. Generation of new variables
From the original signals, we can generate new variablesthat encode potentially useful information, e.g. the first discretederivative of the smoothed signals dx ijk and the Fouriertransform of the chopped smoothed signals X ilk . The Fouriertransform of a signal contains both positive–frequency andnegative–frequency components; hence, instead of X ilk , weuse the amplitude spectrum | X ilk | = √ X ilk X ilk ∗ . The result ofthe concatenation of the original variables with the generatedvariables is the data matrix ~x ijk = { x ijk , dx ijk , | X ijk |} suchthat, for n chann = 2 , the data matrix ~x ijk consists of × n chann = 6 variables. C. Selection of variables
We compute the correlation between each pair of variablesindexed { k , k } with values { ~x ijk , ~x ijk } , which we denoteby Corr k k (Fig. 3, top panel). We also compute the correlationof each variable k with the beat annotations y ic , which wedenote by Corr k y (Fig. 3, bottom panel). https://imbalanced-learn.readthedocs.io/en/stable/index.html y setting an upper bound (e.g. corr max = 0 . ), we usethe correlation between each pair of variables as a measureof redundancy. Variables { k , k } such that | Corr k k | > corr max are classified as redundant, hence one of them canbe removed without loss of information. From Fig. 3 top panel,no removal is justified on the basis of redundancy.By setting a lower bound (e.g. corr min = 0 . ), we usethe correlation between each variable with the beat annotationsas a measure of relevance of that variable in predictingannotations. A variable k such that | Corr k y | < corr min is classified as irrelevant, hence it can be removed withoutloss of information. From Fig. 3 bottom panel, the variables { dx ij , k = sign1 , dx ij , k = sign2 , | X ij , k = sign2 |} can be removed onthe basis of relevance.IV. C LASSIFICATION MODELS
A. Neural networks
Since the heart beats in the data are labelled, we look for aclassification model to infer the patterns common to heart beatsin the same class. Given the nature of the data, neural networks(NN) either of the recurrent type (RNN) or the convolutionaltype (CNN) will be adequate models.An RNN regards each heart beat as a sequence of datapoints in time and combines the value at the previous instantwith a transformation of earlier values. A variation of RNN isthe large–short–term–memory (LSTM) NN. Each heart beatmust be further divided into sublengths of the original beatlength so that the different sublengths are regarded as asequence of data points. Since each heart beat has size len and we are looking for three peak–like structures, then thecharacteristic sublength will be len / . For sublengths,we use fractions of the characteristic sublength, in particular sublen ∈ { / , / , / , } × len = { , , , } . A CNN regards each heart beat as a one–dimensional imageand operates one–dimensional convolutions (Conv1D) over thekernel. Since the characteristic sublength is len / , then thelargest scale will be the length corresponding to the Nyquistfrequency, hence len / / . For kernel sizes, we usepowers of two between 4 and 32; for stride step, we use stride = 1 . For optimiser, we use ADAM (with the default values lr = 0 . , beta 1 = 0 . , beta 2 = 0 . , decay = 0 ); forloss function, we use categorical cross entropy; for numberof epochs, we use n epochs = 25; for batch size, we use batch size = 64; or activation function, we use the rectifiedlinear unit (ReLU). B. Cross–validation
We devise a cross–validation scheme so that each classifi-cation is trained on a manageably sized training set. We firstdivide the entire data matrix ~x ijk into nk = 5 subsets, each ofwhich preserving the proportion among the different classesas ~x ijk . We keep one of the nk subsets as test set with theoriginal class distribution and resample the remaining nk − subsets, producing the resampled data. We then divide theresampled data into nk (cid:48) = 3 subsets, one of which serving as s i g n 1 s i g n 2 d _ s i g n 1 d _ s i g n 2 p s _ s i g n 1 p s _ s i g n 2 sign1sign2d_sign1d_sign2ps_sign1ps_sign2
Correlation matrices.
Top panel: Correlation matrix between eachpair of variables. Bottom panel: Correlation between each variable and theheart beat annotations. the resampled input data. We then divide the resampled inputdata into nk (cid:48)(cid:48) = 5 subsets, one of which serving as resampledtest set and the remaining serving as resampled training set.We rotate the resampled training data set over the nk (cid:48)(cid:48) subsetsso that the fitting of the classification model is done nk (cid:48)(cid:48) timeson different training sets. We then rotate the resampled inputdata over the nk (cid:48) sets so that the fitting of the classificationmodel is done nk (cid:48)(cid:48) × nk (cid:48) times on different training sets.We average the nk (cid:48)(cid:48) × nk (cid:48) classification predictions over nk (cid:48)(cid:48) , yielding nk (cid:48) mean predictions of the test data. We then aver-age the nk (cid:48) classification predictions, yielding one confusionmatrix labeled by the corresponding nk (cid:48) mean accuracies.The fitting resulting from each of the nk (cid:48)(cid:48) × nk (cid:48) subsets isapplied to the test data, thus producing nk (cid:48)(cid:48) × nk (cid:48) classificationpredictions for each element in the test data. The final com-bined prediction is the mean over the nk (cid:48)(cid:48) × nk (cid:48) classificationpredictions, whose resulting accuracies we include in thetables. C. Selection of neural network architecture
In order to select an adequate NN architecture, we trydifferent architectures for RNN and CNN [5]. We use theimplementation from T
ENSOR F LOW via the application pro-gramming interface K ERAS . https://keras.ioable I Selected architectures for testing. C OLUMN
1: I
DENTIFICATION OF THE ARCHITECTURE . C
OLUMNS
DENTIFICATION OF THE PERFORMANCEMETRICS , WHERE A CCURACY IS THE RESULTING ACCURACY FROM THE COMBINED CLASSIFICATION OVER ALL nk (cid:48) SETS , AND (cid:104) P RECISION (cid:105)
AND (cid:104) R ECALL (cid:105)
ARE RESPECTIVELY THE MEAN PRECISION AND MEAN RECALL OVER ALL CLASSES . C
OLUMN
5: I
DENTIFICATION OF THE EFFICIENCYMETRIC , WHERE R UN TIME IS THE RUNNING TIME IN MINUTES .Architecture Accuracy (cid:104)
Precision (cid:105) (cid:104)
Recall (cid:105)
Run time(min)
LSTM ( sublen = 256 ) 0.712 0.479 0.580 436 LSTM ( sublen = 32 ) 0.699 0.460 0.556 445 Conv
Conv + LSTM ( sublen = 256 ) 0.770 0.572 0.569 215 Conv + LSTM ( sublen = 32 ) 0.761 0.513 0.605 143 ConvLSTM ( sublen = 256 ) 0.759 0.564 0.549 536 ConvLSTM ( sublen = 32 ) 0.759 0.570 0.584 391 We represent the NN architectures schematically as se-quences of layers, with Input representing the input data andOutput representing the predicted classification. We explorefour types of architectures: a) architecture with LSTM layers, named
LSTM : LSTM : Input → LSTM → Dropout → Dense → Output ; (1) b) architecture with Conv1D layers, named Conv : Conv : Input → Conv1D → Dropout → MaxPool → Flat → Dense → Output ; (2) c) arquitectures that combine both LSTM and Conv1Dlayers, named Conv + LSTM and
ConvLSTM : Conv + LSTM : Input → Conv1D → Dropout → MaxPool → Flat → Dense → LSTM → Dropout → Dense → Output , (3) ConvLSTM : Input → ConvLSTM → Dropout → Flat → Dense → Output . (4)We first test these architectures for the minimal NN for-mulation and for approximately the same number of layers,setting the kernel size to kernel size = 4 and the numberof filters to n filter = 64 . As measures of performance, weuse the total number of true positives (TP) and the number ofpredicted classes; as measure of efficiency, we use the runningtime (Table I).We observe that the
Conv architecture yields the bestperformance and the shortest running time; this promptedus to consider
Conv for further study. We also observe thatmost heart beats belonging to the classes { /, A , L , N , R , a , e , j } are correctly classified by all architectures. The goal is nowto increase the number of TP of the other classes, namely { F , J , V , f } . D. Selection of convolutional neural network architecture
We test the
Conv architecture for different number andorganisation of layers. We explore eight NNs. In the Conv1Dlayers, we first set kernel size = 4 and padding = valid , and vary the number of filters within the range n filter ∈{ , , } , starting at 16 and doubling every time that theConv1D layer is preceded by a pooling layer. As measure ofperformance, we use the total TP. We start with the NN suggested in Ref. [1], since this NNwas conceived to classify heart beats from the same database.We keep the architecture as shown below, named Acharya : Acharya : Input → Conv1D (ReLU) → Dropout → MaxPool → Conv1D (ReLU) → Dropout → MaxPool → Conv1D (ReLU) → Dropout → MaxPool → Flat → Dense (ReLU) → Dense (ReLU) → Dense (Softmax) → Output . (5)Comparing Acharya (Fig. 4, left panel) with the previousbest performing NN, the TP of { f } increases significantly,whereas the TP of { F , V } decrease, with the total TP stayingapproximately the same.We change Acharya by moving the drop–out layers fromafter the Conv1D layers to after the dense layers, which wename
Acharya 2 . Comparing
Acharya 2 with
Acharya , theTP of { F } increases, whereas the TP of { V , f } decrease, withthe total TP decreasing, thus a worsening in performance.We test the NN suggested in Ref. [6], since this NNwas conceived to estimate cosmological parameters, whichrequires looking for different scales in the data. We keep thearchitecture as shown below, named Gupta : Gupta : Input → Conv1D (ReLU) → AveragePool → Conv1D (ReLU) → Conv1D (ReLU) → AveragePool → Conv1D → AveragePool → Conv1D → AveragePool → AveragePool → Flat → Dense (ReLU) → Dropout → Dense (ReLU) → Dropout → Dense (ReLU) → Dropout → Dense (Softmax) → Output . (6)This NN contains contiguous convolutional layers withoutintermediate pooling layers, forming a block of two Conv1Dlayers. Comparing Gupta with
Acharya , the TP of { F , f } increase but the TP of { V } decreases, with the total TPdecreasing, thus no improvement in performance.We change Gupta by adding another set of contiguousConv1D layers without intermediate pooling layers, as shownbelow, named
Gupta 2 : Gupta 2 : Input → Conv1D (ReLU) → AveragePool
A F J L N Q R V a e f j
Predicted label /AFJLNQRVaefj T r ue l abe l
799 0 0 0 0 1 0 0 0 0 0 0 00 214 0 1 3 66 1 95 40 68 0 2 00 0 5 1 3 0 0 5 137 7 0 1 00 0 0 6 0 0 0 9 1 0 0 1 00 1 0 0 784 0 0 0 0 12 1 2 00 1 0 0 0 798 0 0 1 0 0 0 01 0 0 0 0 0 0 0 0 2 0 4 01 0 0 1 1 0 0 796 1 0 0 0 050 62 1 33 44 2 4 14 372168 8 41 10 5 0 0 0 0 0 0 0 25 0 0 00 1 0 0 0 0 0 0 0 2 1 0 0115 0 0 0 0 0 18 0 7 0 0 53 00 0 0 0 0 0 0 5 0 9 0 0 32ACHARYA NETWORK: scores=[0.66 0.77 0.79] / A F J L N Q R V a e f j
Predicted label /AFJLNQRVaefj T r ue l abe l
797 0 0 0 0 0 0 3 0 0 0 0 01 209 1 3 3 66 0 96 43 68 0 0 00 0 14 2 4 0 0 3 131 4 0 1 00 0 0 3 0 0 0 12 1 0 0 1 00 0 0 0 792 0 2 0 2 2 1 1 00 0 0 0 0 800 0 0 0 0 0 0 00 0 0 1 0 0 3 0 1 0 0 2 00 0 0 1 0 0 0 798 1 0 0 0 050 30 46 80 50 2 19 4 375 92 3 49 00 3 0 0 0 0 0 0 2 25 0 0 00 0 0 0 0 0 0 0 1 0 3 0 055 0 0 0 0 1 50 1 8 1 0 77 00 0 0 0 0 0 0 5 0 9 0 0 32GUPTA_2 NETWORK: scores=[0.72 0.76 0.78] / A F J L N Q R V a e f j
Predicted label /AFJLNQRVaefj T r ue l abe l
797 0 0 0 0 0 0 2 0 0 0 1 01 210 1 3 3 69 0 96 39 68 0 0 00 0 18 3 4 0 0 3 127 4 0 0 00 0 0 7 0 0 0 9 1 0 0 0 00 0 0 0 795 0 1 0 3 0 0 1 00 0 0 0 0 800 0 0 0 0 0 0 00 0 0 0 0 0 3 0 1 1 0 2 00 0 0 1 0 0 0 797 2 0 0 0 049 46 80 49 50 8 16 3 358111 3 27 00 4 0 0 0 0 0 0 2 24 0 0 00 0 0 0 0 0 0 0 1 0 3 0 042 0 1 0 0 0 54 0 8 0 0 88 00 0 0 0 0 0 0 5 0 9 0 0 32GUPTA_4 NETWORK: scores=[0.7 0.77 0.78]
Figure 4.
Confusion matrix from NNs.
Left panel:
Acharya with n conv = 3 convolutional layers, kernel size = { , , } , n pool = 3 MaxPoollayers and n drop = 3 drop–out layers. Centre panel:
Gupta 2 with n conv = 6 convolutional layers, kernel size = { , (4 , , (4 , , } , n pool = 3 AveragePool layers and n drop = 3 drop–out layers. Right panel:
Gupta 4 with n conv = 7 convolutional layers, kernel size = { , (4 , , (4 , , , } , n pool = 4 AveragePool layers and n drop = 3 drop–out layers. All NNs have padding = valid . → Conv1D (ReLU) → Conv1D (ReLU) → AveragePool → Conv1D (ReLU) → Conv1D (ReLU) → Conv1D (ReLU) → AveragePool → Flat → Dense (ReLU) → Dropout → Dense (ReLU) → Dropout → Dense (ReLU) → Dropout → Dense (Softmax) → Output (7)Comparing
Gupta 2 (Fig. 4, centre panel) with
Gupta , theTP of { V } increases but the TP of { F , f } decrease, with thetotal TP increasing, thus an improvement in performance.We change Gupta 2 by adding drop–out layers before eachpooling layer in a similar way to
Acharya , which we name Acharya 3 . Comparing
Acharya 3 with
Gupta 2 , the totalTP stays approximately the same, thus no improvement inperformance.We change again Gupta 2 by adding batch–normalizationlayers (BatchNorm) between the Conv1D layers and the ReLUlayers, which we name
Gupta 3 . Comparing
Gupta 3 with
Gupta 2 , the total TP stays approximately the same, thus noimprovement in performance.While the newly generated Gupta NNs prove to increase theTP of { V } in comparison to Gupta , they can not increase theTP of { F , J , f } . Hence we change
Gupta 2 again by addinga single Conv1D layer just before the flat layer, which wename
Gupta 4 . Comparing
Gupta 4 (Fig. 4, right panel) with
Gupta 2 or Gupta 3 , the total TP decreases slightly, thus aworsening in performance.We change Gupta 4 by adding BatchNorm layers betweenthe Conv1D layers and the ReLU layers, which we name
Gupta 5 . Comparing
Gupta 5 with
Gupta 2 or Gupta 3 , thetotal TP decreases slightly, thus a worsening in performance.We thus select Gupta 2 for further testing.
E. Selection of convolutional neural network hyper–parameters
We explore further the
Gupta 2
NN by varying somehyper–parameters. As measures of performance, we use theaverage accuracies, the accuracy of the resulting accuracy andthe total TP. a) Type of pooling layer and type of padding:
We test thetype of pooling layer and the type of padding simultaneously.For each type of pooling layer, we vary the type of padding inthe Conv1D layers, while keeping the kernel sizes per blockequal to kernel size = { , (4 , , (4 , , } . We observethat AveragePool with padding = same yield the bestperformance. b) Number of dense layers: We vary the number of denselayers within the range n dense ∈ { , , , , , } , wherethe number of drop–out layers is n drop = n dense − , while keeping the number of filters equal to n filter =64 . We observe that
Gupta 2 yields an increase in the totalTP up to n dense = 4 and a decrease for n dense > . Since the difference in performance between n dense = 4 and n dense = 5 is not significant, we keep the original
Gupta 2 with n dense = 4 dense layers due to its simplicity. c) Kernel sizes of each convolutional layer:
We vary thekernel sizes of each block over the range { , , } , arrangingthese three values in combinations respectively of one, twoand three values in consecutive order. With these constraints,the best performing NNs of the Gupta 2 type are for kernel size ∈ (cid:8) { , (4 , , (4 , , } , { , (8 , , (4 , , } , { , (4 , , (4 , , } , { , (16 , , (4 , , } , { , (8 , , (4 , , } , { , (8 , , (4 , , } (cid:9) . (8)While all these NNs yield average accuracies between 0.7and 0.8, the NN with kernel size = { , (8 , , (4 , , } yields the largest total TP and largest resulting accuracy.This suggests that the important features for the classificationof heart beats are best captured in groups of 4, 8 and 16data points, with the convolutions proceeding from large tosmall scales. These scales correspond to { , , } × dt = able II Comparison with previously published work. C OLUMN
1: I
DENTIFICATION OF THE WORK . C
OLUMN
2: I
DENTIFICATION OF THE NETWORK SIZE .C OLUMN
3: I
DENTIFICATION OF THE NUMBER OF CLASSES IN THE DATA . C
OLUMNS
DENTIFICATION OF THE PERFORMANCE METRICS FOR THEBEST PERFORMING NETWORK , WHERE A CCURACY IS THE RESULTING ACCURACY , AND (cid:104) P RECISION (cid:105) w AND (cid:104) R ECALL (cid:105) w ARE RESPECTIVELY THE MEANPRECISION AND MEAN RECALL OVER ALL CLASSES , WEIGHTED BY THE SIZE OF EACH CLASS .Work Network size No. classes Accuracy (cid:104)
Precision (cid:105) w (cid:104) Recall (cid:105) w Acharya et al. (2017) [1] 3 Conv1D + 3 Dense 5 (0.935, 0.940) (0.979, 0.979) (0.960, 0.967)He et al. (2018) [2] 9 Conv1D + 2 Dense 5 (0.979, 0.988)Jun et al. (2018) [3] 6 Conv2D + 1 Dense 8 0.990 0.986 0.978Rajpurkar et al. (2017) [4] 33 Conv1D + 1 Dense 14 0.809 0.827Carvalho (2020) 6 Conv1D + 4 Dense 13 0.821 0.848 0.822 / A F J L N Q R V a e f j
Predicted label /AFJLNQRVaefj T r ue l abe l Figure 5.
Normalised confusion matrix from selected NN.
Gupta 2NN with kernel size = { , (8 , , (4 , , } , AveragePool layers and padding = same , applied to ~x ijk = { x ij , sign1 , x ij , sign2 } . { . , . , . } s . We thus select kernel size = { , (8 , , (4 , , } for further testing. F. Selection of input data
We test
Gupta 2
NN for different data matrices, namely ~x ijk = { x ij , sign1 , x ij , sign2 } , consisting of the original vari-ables, and ~x ijk = { x ij , sign1 , x ij , sign2 , | X ij , sign1 |} , consistingof the variables selected from the correlation constraints.While both data matrices yield mean accuracies between 0.7and 0.8, the data matrix ~x ijk = { x ij , sign1 , x ij , sign2 } withoutsmoothing yields the largest total TP and the largest resultingaccuracy. This suggests that the Fourier transforms of theoriginal signals do not add discriminative information and thatsmoothing the heart beats might erase important features.V. R ESULTS
A. Results from the best–performing neural network
We produce the distribution of the predicted heart beatclasses, which follows approximately the same distribution asthat of the original heart beat classes (Fig. 2, bottom panel).We produce the confusion matrix of our best performing neuralnetwork normalised to the data per heart beat class for easierassessment of the performance per class (Fig. 5). The class thatis worst classified is { F } (“Fusion of ventricular and normalbeat”), which is mostly misclassified as { V } (“Premature ventricular contraction”), hence the neural network is con-founding between two ventricular arrhythmias. The next worseclassified classes are: a) { J } (“Nodal (junctional) prematurebeat”), which is mostly classified as { R } ( “Right bundlebranch block beat”); b) { A } (“Atrial premature beat”), whichis also classified as either { N , R , V , a } (where a stands for“Aberrated atrial premature beat”); and c) { f } (“Fusion ofpaced and normal beat”), which is also classified as either { /, Q } (respectively “Paced beat” and “Unclassifiable beat”). B. Comparison with results from other published work