[PDF] Automated Deep Learning Analysis of Angiography Video Sequences for Coronary Artery Disease

Abstract

The evaluation of obstructions (stenosis) in coronary arteries is currently done by a physician's visual assessment of coronary angiography video sequences. It is laborious, and can be susceptible to interobserver variation. Prior studies have attempted to automate this process, but few have demonstrated an integrated suite of algorithms for the end-to-end analysis of angiograms. We report an automated analysis pipeline based on deep learning to rapidly and objectively assess coronary angiograms, highlight coronary vessels of interest, and quantify potential stenosis. We propose a 3-stage automated analysis method consisting of key frame extraction, vessel segmentation, and stenosis measurement. We combined powerful deep learning approaches such as ResNet and U-Net with traditional image processing and geometrical analysis. We trained and tested our algorithms on the Left Anterior Oblique (LAO) view of the right coronary artery (RCA) using anonymized angiograms obtained from a tertiary cardiac institution, then tested the generalizability of our technique to the Right Anterior Oblique (RAO) view. We demonstrated an overall improvement on previous work, with key frame extraction top-5 precision of 98.4%, vessel segmentation F1-Score of 0.891 and stenosis measurement 20.7% Type I Error rate.

Full PDF

11 Automated Deep Learning Analysis of AngiographyVideo Sequences for Coronary Artery Disease

Chengyang Zhou , Thao Vy Dinh ∗ , Heyi Kong ∗ , Jonathan Yap , Khung Keong Yeo ,Hwee Kuan Lee , and Kaicheng Liang Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore Hwa Chong Institution (College), Singapore National Heart Centre Singapore Duke-NUS Medical School, Singapore School of Computing, National University of Singapore Singapore Eye Research Institute (SERI) Image and Pervasive Access Laboratory (IPAL), Singapore Rehabilitation Research Institute of Singapore Institute of Bioengineering and Nanotechnology, A*STAR, Singapore ∗ Equal contribution

Abstract —The evaluation of obstructions (stenosis) incoronary arteries is currently done by a physician’s visualassessment of coronary angiography video sequences. Itis laborious, and can be susceptible to interobservervariation. Prior studies have attempted to automate thisprocess, but few have demonstrated an integrated suite ofalgorithms for the end-to-end analysis of angiograms. Wereport an automated analysis pipeline based on deep learn-ing to rapidly and objectively assess coronary angiograms,highlight coronary vessels of interest, and quantify po-tential stenosis. We propose a 3-stage automated analysismethod consisting of key frame extraction, vessel segmen-tation, and stenosis measurement. We combined powerfuldeep learning approaches such as ResNet and U-Net withtraditional image processing and geometrical analysis. Wetrained and tested our algorithms on the Left AnteriorOblique (LAO) view of the right coronary artery (RCA)using anonymized angiograms obtained from a tertiarycardiac institution, then tested the generalizability of ourtechnique to the Right Anterior Oblique (RAO) view. Wedemonstrated an overall improvement on previous work,with key frame extraction top-5 precision of 98.4%, vesselsegmentation F1-Score of 0.891 and stenosis measurement20.7% Type I Error rate.

I. I

NTRODUCTION C ORONARY artery disease (CAD) was responsiblefor 18.1% of deaths in Singapore in 2018 [1].Plaque buildup in the coronary arteries impedes bloodﬂow and affects oxygen supply to the heart, whichmay cause chest pain and even heart attacks. Coronaryangiography is a widely used and increasingly commoninterventional procedure used to assess the coronaryarteries for potential plaque obstruction [2]. An iodinatedradiopaque contrast agent is injected into the coronaryarteries via a catheter, followed by a ﬂuoroscopy scanthat acquires a multi-frame video sequence. Due tothe 2D nature of X-ray projections, scans are typically acquired from various angles to obtain more comprehen-sive visualization.The interpretation of coronary angiograms requiresspecialized training and substantial experience in inter-ventional cardiology, and can be time-consuming. As alimited amount of contrast agent dissipates in the vessels,only a limited portion of each ﬂuoroscopy video offersimage frames of sufﬁcient quality (“key frames”). Theseare the frames that display the complete blood vesselof interest with high contrast, allowing cardiologists tomake assessments on the degree of plaque obstruction(stenosis). The procedure can have substantial variationin key frame selection, as well as ﬂuctuations in imagecontrast or visibility due to device angular placementor biological variations [3]. Also, the current relianceon a physician’s visual assessment of angiograms issusceptible to inter-observer variability; the same videointerpreted as 50% stenosis by one physician might be in-terpreted as 70% by another [4]. A study of general car-diologists performing angiographic assessments reporteda 95% conﬁdence interval of 22 percentage points [5],and even panels of experienced angiographers readingsequential angiograms taken 2 years apart reported sub-stantial intrapanel and interpanel disagreement [6]. Thesechallenges in angiographic reading facing both generaland experienced interventional cardiologists highlightthe risk of missed/unnecessary interventional treatments,and the need for an objective and automated tool thatenables rapid processing and analysis of angiographyvideo sequences. Even resource-heavy core labs staffedwith professional angiographers, which are often used inclinical research and trials evaluating patient outcomes,may suffer from the same challenges.Recent developments in image classiﬁcation and seg-mentation with deep learning and neural networks have a r X i v : . [ ee ss . I V ] J a n opened up new possibilities in medical image analysis. Inthis paper, we developed a deep learning-based set of al-gorithms for automated analysis of coronary angiographyvideo sequences. Our 3-stage pipeline, which consistedof key frame extraction, vessel segmentation and stenosismeasurement, simulated the clinical workﬂow and brokedown the assessment process into explainable steps. Byutilizing transfer learning in the evaluation of RightAnterior Oblique (RAO) angiograms, we demonstratedthe generalizability of image features between differentangiographic views.II. R ELATED WORK D EEP learning is a powerful family of data-driventechniques for computer vision, and has been suc-cessfully implemented in several medical image do-mains. Convolutional neural networks (CNN) are a par-ticular type of deep learning that have achieved state-of-the-art accuracy in image classiﬁcation [7], with severalimportant design innovations such as increasing networkdepth (VGG) [8], adding shortcut connections betweenlayers (ResNet) [9] and concatenating outputs from lay-ers (DenseNet) [10]. Speciﬁcally, the U-Net, in whichskip connections are added between layers, achieves ahigh precision in partitioning images into meaningfulcomponents, a task known as image segmentation [11].Many studies of automated angiography analysis haveemployed deep learning techniques, described below.Ongoing efforts to automate angiography video anal-ysis can be categorized into the following tasks: keyframe extraction, vessel segmentation, and stenosis mea-surement. For key frame extraction, classical methodsinclude the use of image processing techniques for vesseldetection, such as the Frangi ﬁlter or other edge detectionalgorithms [12]. These techniques are not data-driven,typically require manual parameter tuning, and show lim-ited generalization to images of lower quality or contrast.On the other hand, [13] achieved high performance witha two-stream summarizer-discriminator CNN.For vessel segmentation, image processing approachesinclude shape-and-motion-mapping [14] and active con-tour models [15]. Recent work employing deep learninghave proposed using a multi-channel CNN [16], a com-bination of two CNNs processing local and global imagepatches [17], and a multi-channel U-Net [18]. The use ofmore complex U-Net architectures with DenseNet, andInceptionResNet feature backbones enables higher accu-racy [19]. Most of these studies achieved a high degreeof precision. However, few of them sought to apply theiralgorithms towards the goal of stenosis measurement.Stenosis measurement is a crucial clinical measure-ment. Prior work used spatial-temporal information to track vessel width [20] and an improved Measure ofMatch (MoM) measurement [21]. Recently, an end-to-end analysis was proposed using three CNNs, each re-sponsible for localizing stenosis, segmentation, and com-parison between normal and lesion vessel widths [22],although single-frame images (not videos) were used,and localization/segmentation accuracies were ∼ ETHODS W E propose a 3-stage end-to-end pipeline: a) keyframe extraction from video sequences, b) vesselsegmentation on key frames, and c) stenosis measure-ment from segmentation masks (Fig. 1A). Stage 1 picks5 frames with the highest vessel clarity from eachvideo. Stage 2 performs segmentation on the chosenframes to highlight the key vascular structure. Stage 3identiﬁes the region of highest stenosis and calculates thepercent stenosis from the segmented vessel. Our methodcan potentially provide end-to-end automatic stenosismeasurement while allowing stage-by-stage manual in-terpretation of intermediate results.

TABLE IP

ATIENT DEMOGRAPHICS AND CROSS - VALIDATION FOLDSINFORMATION

No. of patients No. of patients

Total 102Gender (n=90) Risk factors (n=90) * Female 23 Diabetes 29Male 67 Dyslipidemia 55

Age (n=90)

Hypertension 5018-39 1 Renal Impairment 940-59 34 Smoking (former) 1660-79 53 Smoking (current) 1280+ 2 None of the above 25Train Val Test TotalNo. of patients 61 20 21 102No. of key frames 1329 397 472 2198No. of non-key frames 3927 1298 1308 6533 * Patients may be subjected to two or more risk factors. ** Cross-validation fold 1 information shown and other folds follow a similarratio.

A. Dataset

The coronary arteries consist of the right and leftcoronary artery. During angiography, images of the coro-nary arteries are taken at various angular projections,providing comprehensive visualization of the vesselsand assessment of stenosis. In this study, we focusedprimarily on the right coronary artery (RCA) taken intwo projections - the left and right anterior oblique (LAO AB Key frame extraction process CD Segmentation labeling E Stenosis measurement algorithm

Fig. 1. Schematics of methodology. ( A ) Workﬂow of the proposed angiography analysis pipeline with three main stages: extraction of keyframes from video, vessel segmentation of the selected frames, and stenosis measurement through calculating vessel width, giving stenosislocation and severity. ( B ) Key frame extraction process, showing machine prediction scores, left: two frames from the same video, onenon-key (score: 0.000) and one key (score: 0.922), right: a top-5 key frame. ( C ) Illustration of labeled frames. (a) key frame. (b) vessel notfully formed. (c) contrast agent starts fading. (d) vessel has shifted out. ( D ) A key frame with corresponding segmentation label (the vesselregion between the red arrows is the desired segment for segmentation). ( E ) Proposed stenosis measurement algorithm. Vessel width alongthe centerline is plotted to locate the minimum. Note that orange arrow in (c) points to the position of the most severe stenosis. and RAO). The dataset was obtained from a tertiarycardiac institution. The de-identiﬁed images of con-secutive patients who underwent coronary angiographyfor assessment of their cardiac status were included. Asummary of demographics is presented in Table I. Videosthat had low contrast or visibility of vessels, or had 100%stenosis causing a substantial alteration in vessel shapewere excluded. We employed a set of LAO videos fortraining and testing of the deep learning models, and asmaller set of RAO videos for testing. After excludingvideos of exceptionally low quality, the LAO datasetcomprised 102 videos with 8731 frames of size 512x512pixels, with an average of ± frames per video. B. Key Frame Extraction

Key frame extraction was implemented to select high-quality frames displaying complete vessel shapes forfurther analysis. We employed a two-phase algorithm forthe training of key frame classiﬁcation models (Fig. 1B).We ﬁrst trained a base model on resized 64x64 images using manually generated ground truth labels of 6533non-key and 2198 key frames. The set of criteria forkey frames consists of high vessel clarity, a clear vesseledge, and visibility of the proximal, mid and distal RCAi.e. the contrast agent has reached the crux where theRCA divides into the posterior descending and acutemarginal arteries (Fig. 1C). Key frames were labeled bystudent research assistants (C.Z., T.V.D., H.K.) trainedand supervised by an interventional cardiologist (J.Y.)and postdoctoral researcher (K.L.).The 102 patients are split into training set and test setbased on a 4:1 ratio (81 in training set and 21 in test set).The training set is further divided into 4 equal folds forcross-validation. With patient-level splitting, every non-test set patient appears exactly once in the validation setin order to prevent information leakage between trainingand validation sets. The model runs 500 iterations perepoch for a total of 100 epochs. In each iteration, 16 keyand 64 non-key frames are processed, with each key and non-key frame augmented 7-fold and 1-fold in order tomitigate the small and unbalanced nature of our dataset(Training details see Appendix B). Upon completion oftraining, the classiﬁer is tested on an unseen test set.The base model was designed as ResNet18, using im-plementation from [24], which employs neural networkshortcut paths for better learning [9]. Dropout layerswere used to mitigate overﬁtting. Two convolutionallayers were then added to the base model and the newmodel was trained on images resized to 128x128 pixelswhile reusing the weights of the previous phase. Thismethod, known as progressive resizing [25], uses thebase model as a global feature extractor, while thesecond phase preserves local information by upsizingand training on images with higher resolution. Thistechnique also reduces the computational power requiredto directly train on high-resolution images. We employeda combination of neural network optimizers: RAdam, animproved Adam [26] algorithm which stabilizes trainingwith learning rate warmup [27], and Lookahead, whichreduces optimizer variance [28].Given a single image for prediction, the model gen-erated a score indicating the frame quality (0 - low, 1- high). Our algorithm then selected 5 frames with thehighest scores from each video sequence. In addition to atypical per-frame accuracy metric, we introduced anothermetric calculating the percentage of true key frames outof the best 5 (“Top-5”) output frames of each videosequence (hereafter referred to as the “Top-5 Precision”).

C. Vessel Segmentation

After obtaining high-quality key frames, we appliedvessel segmentation on them in order to extract clearvessel shapes out of the possibly noisy backgrounds. Forvessel segmentation, we implemented a U-Net model[29]. U-Net [11] is widely used in biomedical imagesegmentation due to its capability to preserve both globalstructure and local details; its shortcut connections be-tween contracting and expanding paths facilitate featurereconstruction. Our model consisted of 5 convolutionalblocks, followed by a symmetric set of 5 up-convolutionblocks. Shortcuts by concatenation were introduced be-tween each convolutional block and its correspondingup-convolution block with the same number of channels.The network took a 512x512 pixel image as input andproduced a 512x512 pixel segmentation mask. The U-Net segmented the main part of the vessel i.e. thesegment between the catheter-vessel junction and thebifurcation point of the RCA (Fig. 1D).Due to the time-consuming nature of producing pre-cise pixel-level labels for learning segmentation, weimplemented semi-supervised label propagation [30]. A small subset (20%) of the whole dataset was manuallylabeled and a preliminary U-Net was trained using thesubset. The U-Net was then used to generate approximatelabels for another 20% of the dataset. The vessel masksobtained were manually and efﬁciently corrected, thenadded to the training data for a subsequent U-Net, thusfurther improving the approximate labels. This cycle wasrepeated for two more times. This procedure was usedto generate segmentation labels for 2198 key frames,greatly reducing the amount of manual labor required.Data augmentation was performed to mitigate our limiteddata. Similar to key frame classiﬁcation, U-Net trainingwas conducted using 4-fold cross-validation, using thesame train/validation/test patient cross-validation folds asbefore. (Training details see Appendix B)

D. Stenosis measurement

To assess the severity of stenosis, physicians typicallyestimate the degree of narrowing by observing variationsin vessel width. Hence, we introduced a stenosis mea-surement algorithm as the ﬁnal stage of our end-to-endpipeline. We adopted a classical approach combiningimage processing and geometrical analysis of the seg-mentation mask. First, the segmentation mask was skele-tonized [31], [32] and pruned (Fig. 1E a-b) (Appendix A,Algorithm 1) to produce a centerline. Subsequently, thewidth of the vessel was approximated using the geometryof the vessel slope at multiple points along the smoothedcenterline (Fig. 1E c-d). This type of measurement hasbeen previously proposed for direct use on angiograms[33]; we expected it to perform more reliably on a binarymask, which has clear edge boundaries. This vesselwidth estimation algorithm (Appendix A, Algorithm 2),was repeated for all top-5 key frames from the samevideo. An estimation of percent stenosis was the quotientof two quantities: the “normal” (non-pathological) widthof the vessel, and the minimum (pathological) width.We estimated the normal width by taking the averageof the 30 largest point approximations of vessel widthper frame. The minimum width was estimated by takingthe average of the 3 lowest pixel widths, assumingthat the stenosis was of a focal nature. The stenosisseverity was then calculated as follows: percent stenosis= (1 − average minimum widthaverage maximum width ) × . E. Extension to Right Anterior Oblique (RAO) View

Our three-stage pipeline is trained on data from onespeciﬁc angular projection of a set of up to 9 projec-tions in cardiology practice, namely the LAO straightprojection of the right coronary artery. The Right An-terior Oblique (RAO) projection of the RCA is takenat approximately right angles to the LAO, such that the

TABLE IIM

ODEL PERFORMANCE ON KEY FRAME EXTRACTION , VESSEL SEGMENTATION AND STENOSIS MEASUREMENT

Fold No. Key Frame Extraction Vessel Segmentation Stenosis MeasurementAcc (%) Top-5 Precision (%) F1-Score F1-Score MAE ± SDof AE Acc. forseverelesion (%) Acc. formoderatelesion (%) Val Test Val Test Val Test Val Test (n=48) * (n=48) (n=48) ± Avg. 90.3 89.3 97.3 98.4 0.816 0.812 0.894 0.891 * Evaluation results are obtained from the only 48 videos with available clinical assessments of the percent stenosis. These videos are sourced from boththe validation sets and the hold-out test set. Note that there is no cross-validation at this stage. Mean Absolute Error ± Standard Deviation of Absolute Error.

2, 3

Accuracy by setting 70% i.e. severe lesion & 50% i.e. moderate lesion as the binarizing thresholds.

Fig. 2. Side-by-side comparisons between LAO and RAO an-giograms. Each LAO-RAO pair represents one patient. vessel is the same but has a slightly different shape andappearance due to the orthogonal view projection. Fig.2 shows examples of side-by-side comparisons betweenthe two views. Given the similarities in vessel anatomy,we tested the ability of our trained classiﬁcation andsegmentation models that had only previously seen LAOdata to also interpret RAO data. Through this extension,we examined our algorithms’ capability to generalize onunseen but related data. We used a test set of 17 RAOvideos with 1500 frames (1038 non-key and 462 key).Subsequent results are generated using the test set. Inthe ﬁrst two stages of key frame extraction and vesselsegmentation, the RAO videos were processed using thetrained LAO model weights. For the purpose of eval-uating the generalizability of our model, no additionalmodel training on RAO data was done. In the third stage,the same stenosis measurement algorithm was used.IV. R

ESULTS & D

ISCUSSION T HE experiments for each stage are conducted inindependent settings. Speciﬁcally, the key framesused in stage 2 and the segmentation masks used in stage3 are ground truth labels. This is to serve as a feasibilitytest (proof of concept) of our proposal.

A. Key Frame Extraction

Table II shows the per-frame analysis performance ofour model, and the per-video Top-5 Precision metric. Ona per-frame basis, we factor in the unbalanced nature of

TABLE IIIC

OMPARISON WITH RELATED WORK

Vessel SegmentationMethod F1-scoreYang et al. [19] 0.930

Our work 0.891

Stenosis MeasurementMethod Type I Error (%)MWCE-End-to-End [22] 36.812-feature classiﬁer [34] 16.2 Our work 20.7 Signiﬁcant difference in dataset size exists (1021patients for Yang et al. and 102 patients for ourwork). MWCE-End-to-End model’s workﬂow is subjectedto potential errors during the former stage. our dataset (6533 non-key vs 2198 key) by includingF1-score as a metric. Moreover, in practice it is notnecessary for every key frame (often up to 20 availablein each video) to be correctly identiﬁed; hence the Top-5metric is intended as a more realistic measure of whetheran adequate number of key frames (5 as a convenientthreshold) may be identiﬁed per video.We observe that the metric adopted by previous worksuch as Frangi Filter and Edge Detection [12] is different(the percentage of videos whose output contains at least1 key frame) from top-5 precision, and is hence not asuitable basis for comparison. The apparent improvementdemonstrated by our top-5 precision illustrates the ro-bustness of data-driven approaches such as deep learning.The model performance was partly hindered by“noisy” ground truth labels due to inter-observer vari-ations among our human readers, especially on framesthat occurred on the margin of the contrast-enhancedtime period in each video. We observed that some ofthese errors could be “corrected” by the trained model;a key frame erroneously labeled as non-key in manuallabeling could be detected by the model, giving the frame a high score; similarly, a non-key frame with incompletevessel shape erroneously labeled as key frame wasdetected and given a low score (Fig. 3B).Since the algorithm was trained on individual framesand analyzed frames one at a time, we did not incor-porate temporal information that may provide additionalinformation to the machine learning; alternate designsof convolutional networks incorporating time-series datacould be investigated in the future. Nevertheless, ourper-frame analysis could reconstruct the temporal rela-tionship within video sequences, where predicted scoresshowed a characteristic curve (Fig. 3A), as key framestypically cluster around the middle of a video (referredto as the “key frame region”). Frames just before, in,and right after the key frame region are highlighted,illustrating the sensitivity of the model to changes inthe visibility and contrast of the vessel anatomy.

B. Vessel Segmentation

Table II shows the results of our model, with F1-scoreas the metric, given by:F1-score = 2 (cid:80) i =1 .. prediction × truelabel (cid:80) i =1 .. prediction + (cid:80) i =1 .. truelabel (1)where prediction and truelabel are arrays representinggenerated and ground-truth masks respectively. Our seg-mentation model can localize heart vessels with rela-tively high precision (Fig. 3C). Comparing the perfor-mance of our model with Yang et al. [19], our modelusing a similar but much smaller training dataset (102vs 1021 patients) was able to reach a comparable result(Table. III). Our neural network architecture was alsoof lower complexity, requiring less computational powerfor training, albeit prone to errors especially near thedistal RCA bifurcation (Fig. 3E). More complex U-Netarchitectures and more training data will improve theperformance in these difﬁcult areas. C. Stenosis Measurement

In predicting the stenosis, our algorithm provides thepercent stenosis and approximate position of the mostsevere obstruction (Fig. 3F). The performance of thealgorithm was evaluated by comparing its predictionsto cardiologists’ assessments (Table II). The dataset forevaluation consists of 48 videos with available clini-cal assessments of the percent stenosis obtained frompatients’ reports. Here, we did not make a distinctionbetween the training and test sets due to the non-data-driven nature of the algorithm, which means it operatesin a “blind” manner. Quantitative metrics include Mean Absolute Error (MAE) and Standard Deviation of Abso-lute Error (SD of AE).MAE = (cid:80) i =1 ..n | truevalue − prediction | n (2)where truevalue and prediction represent the true clin-ical assessment of the percent stenosis and machineprediction of the percent stenosis respectively, and n rep-resents the total number of patients. As part of additionalqualitative analysis, we also obtained the accuracy fordetecting severe and moderate stenosis of the algorithmby binarizing the percent stenosis at the 70% and 50%thresholds as per typical clinical practice for signiﬁcantlesions and moderate lesions respectively [35]. Patientswhose percent stenosis are classiﬁed as severe lesions arehighly recommended to undergo interventionist surgicalprocedures to restore normal blood ﬂow, while thosewith moderate lesions are rated on a case-by-case basisby cardiologists. Our model’s performance surpassesthe MWCE-End-to-End model [22] but misses the 12-feature classiﬁer [34] by a 4.5% margin (Table III).The algorithm relies on simple geometrical relation-ships to compute percent stenosis, which is intuitive andanalogous to human readings. However, its simplicitymay have led to a relatively large error margin (15.9% ± D. End-to-end pipeline

We developed an end-to-end tool in a prototype soft-ware interface that integrates our machine models andalgorithms. The software takes in an angiography videoas input, and performs analysis, after which the user maynavigate the results of key frame extraction, vessel seg-mentation, and stenosis measurement. The software tookapproximately 30 seconds for an end-to-end analysis. Webelieve that this class of automated tools can serve toassist cardiologists in their diagnoses, as well as rapidlygenerate repeatable and objective measurements on largedatasets in a high-throughput fashion.While our goal was to demonstrate the feasibility ofsuch a multi-stage algorithm, we assessed the perfor-mance of each of the three analysis stages separately. The

A BC DE FG

Fig. 3. Illustration of LAO and RAO results. ( A ) Temporal score variation over the course of an LAO angiography video. Each data pointrepresents the machine predicted score of that frame. Three particular frames are highlighted in red as they represent the frame just before,in and after the “key frame region”. ( B ) Machine’s inferential ability shown by noisy incorrect LAO labels detected by the model. Firstimage: a key frame erroneously labeled as non-key in manual labeling, and the model gave the frame a high score (0.998) Second image:a non-key frame with vessel shifted out erroneously labeled as a key frame; it was detected and given a low score (0.083) ( C ) LAO vesselsegmentation, (a) original images, (b) segmentation masks. ( D ) RAO vessel segmentation, (a) original images, (b) segmentation masks. ( E )Erroneous LAO segmentation near the RCA bifurcation point. ( F ) Identiﬁcation of LAO stenosis location and estimation of severity. ( G )Side-by-side comparisons between LAO and RAO stenosis measurement. Each column represents one patient. segmentation algorithm was trained and evaluated usingall key frames, rather than key frames selected by the key frame classiﬁer. Similarly, the stenosis measurementalgorithm was evaluated on all ground truth segmenta- TABLE IVC

OMPARISON BETWEEN

LAO

AND

RAO

RESULTS

Metric LAO

RAO

Key Frame ExtractionAcc (%) 89.3

Precision 0.761

Recall 0.873

Top-5 Precision (%) 98.4

Vessel SegmentationF1-Score 0.891

Stenosis MeasurementMAE ± SD (%) 13.5 ± ± Acc. for severe lesion (%) 71.4

Acc. for moderate lesion (%) 95.2 tion masks, rather than actual U-Net predictions. Thisallowed us to objectively evaluate the algorithmic per-formance of each stage and compare our results to priorstudies, but did not reﬂect true end-to-end performance.

E. Extension to Right Anterior Oblique View

Key frame extraction and vessel segmentation perfor-mance evaluation on LAO and RAO data are presentedin Table IV. A slight dip in performance is seen usingthe RAO view in comparison to LAO, which can beattributed to the fact that the models were intially trainedon LAO data. However, results on RAO are remarkablyrobust, indicating that the trained models have somegeneralizability to vessels with similar anatomical struc-tures. Segmentation F1-score for the RAO view also didnot dip signiﬁcantly. The predicted masks delineated theRCA vessel structures clearly and included appropriateendpoints at their distal bifurcations (Fig. 3D), whichwere landmarks learned from LAO labels. Further ﬁne-tuning of the models on a moderate amount of true RAOlabels can greatly boost performance if required.As a preliminary exploration of how automated mea-surements of a given lesion from different views maybe aggregated, we performed manual visual comparisonsbetween the identiﬁed stenosis positions of lesions inLAO and RAO views (Fig. 3G). The algorithm couldidentify the correct location of the most severe blockagein nearly all videos (14/15 patients). The automaticallyboxed regions highlighting the most severe stenosis inthe LAO and RAO views correspond to each other.The algorithm evaluates the most severe obstruction ina given view; different views highlight different segmentsof the vessel, and thus this aggregation of observationsfrom multiple views is expected to improve the accuracyof the prediction, and simulating a cardiologist’s practice.We suggest using the higher percent stenosis of the twopredictions retrieved from the LAO and RAO views i.e. Max(LAO, RAO). To validate this concept, a subset of15 patients (test set) with both LAO and RAO readingswas used. Using Max(LAO, RAO) gave . ± . as MAE ± SD of AE. The correct detection rate ofthe stenosis at the 70% thresholds was 73.3%, reaching100% at the 50% threshold.The extension to the RAO view provides inspirationfor further research. Future work include expansion tomore views such as the anterior-posterior (AP) Cranialview to visualize the more distal vessel segments. Lever-aging information from individual views and combiningthem in an optimal way will improve our model’s accu-racy. The ability to integrate information from multipleviews will be even more critical as we continue workon the left coronary artery, which has a more complexvessel architecture studied from up to 6 view projections.V. C

ONCLUSION

We developed a set of algorithms designed as an inte-grated end-to-end automated tool for analyzing angiogra-phy video sequences. Automating the analysis of coro-nary angiograms could assist cardiologists with visualassessment and report generation of these angiograms,while mitigating inter-observer variability and preventingunnecessary stenting. The capability to process largedatasets in a repeatable, objective fashion could alsobe an important tool in clinical trials and research.Future improvements include: 1. the addition of videosequences with artefacts and near-complete occlusions(99%) into the dataset and subsequent model tuning,2. incorporation of time-series data into the currentclassiﬁcation model, and 3. employing a more complexstenosis measurement algorithm combining informationfrom vessel widths proximal to lesions and other localmaxima/minima features. Additionally, regression mod-els are a possible alternative to the current non-data-driven approach utilized in stage 3. Extending theseconcepts to the left coronary artery and more viewprojections (cranial, caudal), as well as an evaluation ofend-to-end performance on larger unseen datasets will bean essential component in realizing the potential for clin-ical translation. Ultimately, after rigorous validation byexperts, these automated tools could have a role in sug-gesting recommendations for interventional treatment,with even more value to community or rural hospitalsthat may lack specialist expertise or infrastructure.A

CKNOWLEDGMENT

This work is supported by the Agency for Science,Technology and Research AI and Analytics Seed Grantno. Z20F3RE003. R EFERENCES . moh . gov . sg/resources-statistics/singapore-health-facts/principal-causes-of-death.[2] S. Jiangping, Z. Zhe, W. Wei, S. Yunhu, H. Jie, W. Hongyue,Z. Hong, and H. Shengshou, “Assessment of coronaryartery stenosis by coronary angiography,” Circulation:Cardiovascular Interventions . ahajournals . org/doi/abs/10 . . . Clinical Cardiology , vol. 14, no. 1, pp. 20–24, 1991, eprint:https://onlinelibrary.wiley.com/doi/pdf/10.1002/clc.4960140106.[Online]. Available: https://onlinelibrary . wiley . com/doi/abs/10 . . Journalof the American College of Cardiology . sciencedirect . com/science/article/pii/0735109788902264[5] W. G. Kussmaul, R. L. Popp, and J. Norcini, “Accuracyand reproducibility of visual coronary stenosis estimatesusing information from multiple observers,” ClinicalCardiology , vol. 15, no. 3, pp. 154–162, 1992, eprint:https://onlinelibrary.wiley.com/doi/pdf/10.1002/clc.4960150305.[Online]. Available: https://onlinelibrary . wiley . com/doi/abs/10 . . American Heart Journal . sciencedirect . com/science/article/pii/0002870382900175[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenetclassiﬁcation with deep convolutional neural networks,”in Advances in Neural Information Processing Systems 25 ,F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger,Eds. Curran Associates, Inc., 2012, pp. 1097–1105.[Online]. Available: http://papers . nips . cc/paper/4824-imagenet-classiﬁcation-with-deep-convolutional-neural-networks . pdf[8] K. Simonyan and A. Zisserman, “Very deep convolutionalnetworks for large-scale image recognition,” CoRR , vol.abs/1409.1556, 2014.[9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learningfor image recognition,”

CoRR , vol. abs/1512.03385, 2015.[Online]. Available: http://arxiv . org/abs/1512 . CoRR , vol. abs/1608.06993, 2016.[Online]. Available: http://arxiv . org/abs/1608 . CoRR , vol.abs/1505.04597, 2015. [Online]. Available: http://arxiv . org/abs/1505 . , Aug 2010, pp. 4008–4011.[13] X. Yan, S. Z. Gilani, H. Qin, M. Feng, L. Zhang, andA. S. Mian, “Deep keyframe detection in human actionvideos,” CoRR , vol. abs/1804.10021, 2018. [Online]. Available:http://arxiv . org/abs/1804 . IET Computer Vision , vol. 8, no. 3, pp.161–170, June 2014.[16] J. Fan, J. Yang, Y. Wang, S. Yang, D. Ai, Y. Huang, H. Song,A. Hao, and Y. Wang, “Multichannel fully convolutional net-work for coronary artery segmentation in x-ray angiograms,”

IEEE Access , vol. 6, pp. 44 635–44 643, 2018.[17] E. Nasr-Esfahani, N. Karimi, M. Jafari, S. Soroushmehr,S. Samavi, B. Nallamothu, and K. Najarian, “Segmentation ofvessels in angiograms using convolutional neural networks,”

Biomedical Signal Processing and Control . sciencedirect . com/science/article/pii/S1746809417302215[18] A. Vlontzos and K. Mikolajczyk, “Deep segmentationand registration in x-ray angiography video,” CoRR , vol.abs/1805.06406, 2018. [Online]. Available: http://arxiv . org/abs/1805 . ScientiﬁcReports , vol. 9, no. 16897, 2019.[20] C. B. Compas, T. Syeda-Mahmood, P. McNeillie, andD. Beymer, “Automatic detection of coronary stenosis in x-rayangiography through spatio-temporal tracking,” in ,April 2014, pp. 1299–1302.[21] T. Wan, H. Feng, C. Tong, D. Li, and Z. Qin, “Automatedidentiﬁcation and grading of coronary artery stenoseswith x-ray angiography,”

Computer Methods and Programsin Biomedicine , vol. 167, pp. 13–22, Dec. 2018. [Online].Available: https://doi . org/10 . . cmpb . . . CoRR , vol. abs/1807.10597, 2018. [Online]. Available: http://arxiv . org/abs/1807 . Computer Methods andPrograms in Biomedicine . sciencedirect . com/science/article/pii/S0169260720316527[24] F. Chollet, “Keras,” https://github . com/fchollet/keras, 2015.[25] A. Bilogur, “Boost your cnn image classiﬁerperformance with progressive resizing inkeras,” Apr 2019. [Online]. Available: https://towardsdatascience . com/boost-your-cnn-image-classiﬁer-performance-with-progressive-resizing-in-keras-a7d96da06e20[26] D. P. Kingma and J. Ba, “Adam: A method for stochasticoptimization,” CoRR , vol. abs/1412.6980, 2014.[27] L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han,“On the variance of the adaptive learning rate and beyond,”

ArXiv , vol. abs/1908.03265, 2019.[28] M. R. Zhang, J. Lucas, G. E. Hinton, and J. Ba,“Lookahead optimizer: k steps forward, 1 step back,”

CoRR , vol. abs/1907.08610, 2019. [Online]. Available: http://arxiv . org/abs/1907 . . com/zhixuhao/unet, 2018, accessed: 2019-04-25. [30] A. Iscen, G. Tolias, Y. Avrithis, and O. Chum, “Labelpropagation for deep semi-supervised learning,” CoRR , vol.abs/1904.04717, 2019. [Online]. Available: http://arxiv . org/abs/1904 . Commun. ACM , vol. 27,no. 3, pp. 236–239, Mar. 1984. [Online]. Available: http://doi . acm . org/10 . . PeerJ , vol. 2, p. e453, 6 2014. [Online]. Available:https://doi . org/10 . . Catheterization and CardiovascularDiagnosis , vol. 11, no. 1, pp. 5–16, 1985, eprint:https://onlinelibrary.wiley.com/doi/pdf/10.1002/ccd.1810110103.[Online]. Available: https://onlinelibrary . wiley . com/doi/abs/10 . . Journal of the American Heart Association ,vol. 8, no. 4, Feb. 2019. [Online]. Available: https://doi . org/10 . . . Radiology , vol.266, no. 1, pp. 289–294, Jan. 2013. [Online]. Available:https://doi . org/10 . . RadioGraphics ,vol. 25, no. 5, pp. 1141–1158, Sep. 2005. [Online]. Available:https://doi . org/10 . . A PPENDIX AA LGORITHMS

Algorithm 1

Pruning algorithm

Input

Set of points forming the centerline S procedure P RUNE ( S ) neighbors of a point ← points in S within the 8-connected neigh-borhood of the reference point endpoints ← points in S with 2 neighbors for endpoint in endpoints do bifurcationpoint ← point in S with 4 neighbors that isnearest to endpoint branch ← set of points including endpoint , bifurcationpoint and all points in S that are in between if | branch | ≥ then remove branch end if end for end procedure Algorithm 2

Overall vessel width extraction algorithm

Input

Vessel masks acquired from the segmentationstage procedure P LOT V ESSEL W IDTH Skeletonize the vessel maskPrune resultant centerline to remove unnecessary branches for every point on the centerline do Plot the normal to the centerline at that point Search for the two bounds lying on either side of the vessel maskon the normal lineCalculate the euclidean distance between the bounds (this dis-tance corresponds to a pixel width) end for Plot graph of pixel widths against the point index at which they aremeasured end procedure A PPENDIX BT RAINING DETAILS

TABLE VT

RAINING SETUP

Key frame extraction Vessel segmentationBatch size 64 2Weights initializer random normal HE normalOptimizer RAdam AdamLR 1e-3 1e-4 β β TABLE VIA

UGMENTATION HYPERPARAMETERS FOR THEKEY FRAME EXTRACTION MODEL AND THEVESSEL SEGMENTATION MODEL

Methods Classiﬁcation SegmentationRotation angle ( − °, °) ( − °, °)Shear angle ( − °, °) ( − °, °)Horizontal translation (0,0.2) (-0.05,0.05)Vertical translation (-0.2,0.2) (-0.3,0.3)Scale (0.8,1.0) (0.7,1.3)CLAHE (1,5) -Sharpen (0.4,0.6) - Values in the parentheses represent range.2