[PDF] An approach for auxiliary diagnosing and screening coronary disease based on machine learning

Abstract

How to accurately classify and predict whether an individual has Coronary Artery Disease (CAD) and the degree of coronary stenosis without using invasive examination? This problem has not been solved satisfactorily. To this end, the three kinds of machine learning (ML)algorithms, i.e., Boost Tree (BT), Decision Tree (DT), Logistic Regression (LR), are employed in this paper. First, 11 features including basic information of an individual, symptoms and results of routine physical examination are selected, and one label is specified, indicating whether an individual suffers from CAD or different severity of coronary artery stenosis. On the basis of it, a sample set is constructed. Second, each of these three ML algorithms learns from the sample set to obtain the corresponding optimal predictive results, respectively. The experimental results show that: BT predicts whether an individual has CAD with an accuracy of 94%, and this algorithm predicts the degree of an individuals coronary artery stenosis with an accuracy of 90%.

Full PDF

11 An approach for auxiliary diagnosing and screeningcoronary disease based on machine learning

Weijun ZHU (1)(2) and En LI (1)1 the Second Afﬁliated Hospital, Zhengzhou University, Zhengzhou, China.2 School of Information Engineering, Zhengzhou University, Zhengzhou, China.Corresponding author: En LI

Abstract —How to accurately classify and predict whether anindividual has Coronary Artery Disease (CAD) and the degreeof coronary stenosis without using invasive examination? Thisproblem has not been solved satisfactorily. To this end, the threekinds of machine learning (ML) algorithms, i.e., Boost Tree (BT),Decision Tree (DT), Logistic Regression (LR), are employed inthis paper. First, 11 features including basic information of anindividual, symptoms and results of routine physical examinationare selected, and one label is speciﬁed, indicating whether anindividual suffers from CAD or different severity of coronaryartery stenosis. On the basis of it, a sample set is constructed.Second, each of these three ML algorithms learns from thesample set to obtain the corresponding optimal predictive results,respectively. The experimental results show that: BT predictswhether an individual has CAD with an accuracy of 94%, andthis algorithm predicts the degree of an individuals coronaryartery stenosis with an accuracy of 90%.

Index Terms —coronary stenosis, coronary artery disease, ma-chine learning, classiﬁcation, mass screening, intelligent diagno-sis.

I. I

NTRODUCTION T HE World Health Organization believes that cardiovascu-lar diseases (CVDs) have become the number one killerthreatening human life [1]. In 2016, 17.9 million people diedof such diseases, accounting for 31% of the total global deathsin that year [1]. In fact, CVDs includes a number of differentspeciﬁc diseases, and CAD is one of the most fatal CVDs [2].Coronary angiography has long been recognized as the goldstandard for the diagnosis of CAD, and it plays an importantrole in clinical practice. However, inserting a catheter into anartery is, after all, an invasive physical examination, not suit-able for large-scale screening in the general population. Canan AI algorithm be used to do something for this? We need analgorithm to intelligently analyze the data including the basicinformation of an individual, routine physical examination andsymptoms, so that the relationship between these raw data andthe diagnosis conclusion are obtained, aiming to realize theauxiliary diagnosis indicating whether an individual has CADor coronary stenosis. This is the purpose of this study.II. O

BJECTIVE

We will explore the ability and efﬁciency of multiplemachine learning algorithms in terms of the intelligent di-agnosis for CAD and coronary artery stenosis, according toan individuals personal basic information, routine physicalexamination and symptoms. III. M

ETHOD (1) A method is designed, as shown in Fig.1. And GraphLab[3] is used to simulate and implement the following threemachine learning algorithms: boost tree (BT) [4], decision tree(DT) [5] and logistic regression (LR) [6].(2) Training and prediction will be performed on a samplewhich has 1000 records (dataset A), where: the value of labelof 623 records is 1, which means ”severe coronary stenosis,and a stent implantation is needed”; the value of label of125 records is 2, which means ”coronary artery is moderateand mild stenosis, and no stent implantation is needed”; thevalue of label of 252 records is 3, which means ”coronaryartery is normal”. In other words, the data needed processedby the machine learning algorithms have a label of threeclassiﬁcation. The factors inﬂuencing the diagnosis are asfollows: gender, age, fasting plasma glucose (FPG), LDL,history of hypertension (years), history of diabetes (years),smoking history (years), sweating at the onset of the disease,ECG (ST segment is elevated”, ”ischemic change occurs”,”the above situation does not occur”), whether plaques occurin the neck vessels or not, whether a cardiac color Dopplerultrasound indicates some abnormal wall motions or not. Inother words, the machine learning algorithms need to processsome data with 11 features.IV. R

ESULTS

A. Predictive Ability about an Individuals Coronary Stenosis(Three Classiﬁcation)

The corresponding optimal predictive result is obtainedby adjusting the value of the hyper-parameters, using thethree machine learning algorithms, respectively, as shown inTable 1. And Table 2 depicts the corresponding values ofhyper-parameters. Obviously, the tree-like algorithms, i.e., BTand DT, have the higher accuracy. Especially for BT, thepredictive accuracy is more than 90%. Thus, this algorithmis recommended by the experimental results.False negative rate (FNR) is the rate of missed diagnosis,and false positive rate (FPR) is the rate of misdiagnosis rate.Especially for missed diagnosis, it is very harmful to patients.FNR and FPR are controlled at a relative low level (4% and7% respectively), when BT is used for predicting coronarystenosis, as shown in Table 3. It prompts that BT is a relativelysuitable method for our mission. a r X i v : . [ q - b i o . T O ] J u l B. Predictive Ability about whether an Individual has Coro-nary Disease or not (Binary Classiﬁcation)

This time, BT shows excellent performance again. And itspredictive accuracy reaches 94%, as well as both FNR andFPR are controlled within 4%, as shown in Table 3.

C. Efﬁciency and Summary

In our experiment, all of the three ML algorithms showfaster speed. The average running time of each algorithmfor one record (one individual) is within 0.0002 seconds, asdepicted in Table 1. Furthermore, no signiﬁcant difference interms of the running speed occurs, among the three algorithms.In summary, BT is more suitable than DT and LR, in termsof predicting an individuals coronary artery stenosis or CAD.V. C

ONCLUSIONS

In this study, the machine learning technique is employedto predict (intelligent diagnosis) coronary artery stenosis orCAD for individuals. In this way, the invasive examinationsuch as coronary angiography, is no longer needed for manyindividual. And they only need to check some personal basicinformation, symptoms and some results of routine physi-cal examination such as blood test, electrocardiogram, colorDoppler ultrasound, etc., which can be used to accuratelypredict whether an individual has CAD and the severity ofcoronary artery stenosis. On the one hand, coronary angiogra-phy puts forward a high requirement for doctors and hospitals.On the other hand, invasive examination is not suitable fora large number of low-risk general population. Therefore,compared with the conventional coronary angiography, thenew method is more suitable for large-scale screening of CADin grass-roots hospitals. Considering that this disease is a fataldisease, and the number of patients and the number of potentialrisk groups are very large, the comparative advantages andpotential clinical prospects of the new method are obvious.A

CKNOWLEDGMENT

This work has been supported by the National Natural Sci-ence Foundation of China under Grant U1204608, 61572444.R

EFERENCES

Fig. 1. A method for auxiliary diagnosing coronary artery stenosis based onmachine learning

TABLE IO

PTIMAL RESULTS AND SOME METRICS OF THE THREE ML ALGORITHMS ON DATASET

A—— BT DT LRoptimal accuracy 0.904 0.872 0.851AUC 0.789 0.776 0.802precision 0.855 0.778 0.749recall 0.582 0.530 0.541average predictive time for one record (s) 0.00019 0.00018 0.00017TABLE IIT

HE VALUES OF HYPER - PARAMETERS , WHEN OPTIMAL RESULTS OF THE ML ALGORITHMS OCCURS —— BT DT LRfraction 0.89 0.89 0.89seed 2129 2129 2129max iterations 6 — —min child weight 19 1 —max depth 5 4 —TABLE IIIS