Diagnostic Accuracy of Computed Tomography for Identifying Hospitalization in Patients with Suspected COVID-19
Sergey P. Morozov, Roman V. Reshetnikov, Victor A. Gombolevskiy, Natalia V. Ledikhova, Ivan A. Blokhin, Vladislav G. Kljashtorny, Olesya A. Mokienko, Anton V. Vladzymyrskyy
DDiagnostic Accuracy of Computed Tomography forIdentifying Hospitalization in Patients with SuspectedCOVID-19
Morozov S.P. , Reshetnikov R.V.
1, 2, * , Gombolevskiy V.A. , Ledikhova N.V. , BlokhinI.A. , Kljashtorny V.G. , Mokienko O.A. , Vladzymyrskyy A.V. Abstract
Background and Objectives
The use of computed tomography (CT) in COVID-19 screening is controversial. Thecontroversy is associated with ambiguous characteristics of chest CT as a diagnostic test.The reported values of CT sensitivity and especially specificity calculated using reversetranscription polymerase chain reaction as a reference standard vary widely, raisingreasonable doubts about the applicability of the method. The objective of this study wasto reevaluate the diagnostic and prognostic value of CT using an alternative approach.
Methods
This study included 973 symptomatic COVID-19 patients aged 42 ±
17 years, 56%females. For all of them, we reviewed the disease dynamics between the initial andfollow-up CT studies using a “CT0-4” visual semi-quantitative grading system to assessthe severity of the disease. Sensitivity and specificity were calculated as conditionalprobabilities that a patient’s condition would improve or deteriorate, depending on theresults of the initial CT examination. For the calculation of negative (NPV) andpositive (PPV) predictive values, we estimated the COVID-19 prevalence in Moscow.The data on total cases of COVID-19 from March 6, 2020, to July 20, 2020, were takenfrom the Rospotrebnadzor website. We used several ARIMA and EST models withdifferent parameters to fit the data and forecast the incidence.
Results
The “CT0-4” grading scale demonstrated low sensitivity (28%), but high specificity(95%). The best statistical model for describing the epidemiological situation in Moscowwas ETS with multiplicative trend, error, and season type. According to ourcalculations, with the predicted prevalence of 2.1%, the values of NPV and PPV wouldbe 98% and 10%, correspondingly.
Discussion
We associate the low sensitivity and PPV values of the “CT0-4” grading scale with thesmall sample size of the patients with severe symptoms and non-optimal methodologicalsetup for measuring these specific characteristics. We found that the grading scale washighly specific and predictive for identifying admissions to hospitals of COVID-19patients. Despite the ambiguous accuracy, chest CT proved to be an effective practical1/10 a r X i v : . [ q - b i o . Q M ] J u l ool for patient management during the pandemic, provided that the necessaryinfrastructure and human resources are available. Introduction
As of July 3, 2020, the COVID-19 pandemic led to more than 11 million confirmedcases globally, with nearly 525,000 deaths [1]. COVID-19 has a range of diseasepresentations. It may be completely asymptomatic or manifest itself with mild flu-likesymptoms (80% of infections), and there are severe and critical conditions requiringoxygen or ventilation (15% and 5%, correspondingly) [2]. It is essential to control thedisease spreading, but the variability of the symptoms makes the task of diagnosingCOVID-19 very challenging. The availability of rapid and accurate COVID-19 testingtools becomes a critical factor in reducing human-to-human SARS-CoV-2 transition.Currently, commercially available COVID-19 tests fall into two major categories: (i)assays detecting viral RNA via polymerase chain reaction (PCR) or nucleic acidhybridization, and (ii) serological and immunological assays detecting antibodiesproduced by individuals [3]. Despite constant improvement and evolving, bothcategories have their disadvantages. For example, reverse transcription PCR (RT-PCR)requires expensive equipment, highly trained operators, and can take days to provideresults. As for the serological tests, there is currently no strong evidence ofseropositivity correlation with immunity to the virus [3]. There remains an urgentdemand for an effective tool for COVID-19 diagnosis, and computed tomography (CT)provides a promising solution.CT is sensitive for early parenchymal lung disease, disease progression, andalternative diagnoses [4]. Moreover, CT results can be obtained within 20 minutes,which is an essential factor during the pandemic. Nevertheless, there is considerablecontroversy regarding the use of CT imaging in COVID-19 diagnostics. The estimatesof the CT chest sensitivity and sensibility vary widely and strongly depend on theradiologist’s competence (Table S1).RT-PCR tests, widely used as a reference standard in COVID-19 CT studies, alsohave limited diagnostic value. The probability of false-negative RT-PCR resultdecreases from 100% on day 1 of infection to 68% on day 4. On day 5 (typical time ofsymptom onset), the false-negative rate is 38%, decreasing to 20% on day 8 with thefollowing increase to 66% on day 21 [5]. The sensitivity of RT-PCR tests might be aslow as 59% [6] and depends on several factors, including individual variability in viralshedding [7]. It makes the method a sub-optimal reference standard. The objective ofthis study was to reevaluate the diagnostic value of CT through repeated scanning ofpatients with suspected COVID-19 that were either disposed to home care orhospitalized. We used a recently reported CT grading scale [8] modified by us [9] toassess the presence and severity of the disease. According to our results based onretrospective observations of 973 symptomatic subjects, CT has sensitivity 28%,specificity 95%, positive predictive value (PPV) 10%, and negative predictive value(NPV) 98% for identifying admissions to the hospitals of COVID-19 patients. Despiteambiguous characteristics, CT proved to be an effective practical tool for patient triageduring the pandemic, which can significantly reduce the burden on hospitals.
Materials and Methods
In this retrospective study, we analyzed primary and secondary CT results of individualsaged from 18 to 80 years examined at the Moscow Outpatient Computed TomographyCenters (OCTC) from April 4, 2020, to May 18, 2020. The patients eligible for inclusion2/10ere the subjects with the symptoms of acute respiratory infection (ARI). The clinicaldiagnosis of COVID-19 (ICD10 code: U07.2) was based on a combination of the ARIsymptoms and CT results. CT images were acquired with recommended scanningparameters for standard-size patients (height, 170 cm; weight, 70 kg): voltage 120 kV,automatic tube current modulation, field of view 350 mm, slice thickness ≤ P worse | hospital : P worse | hospital = number of “ hospital ” patients af ter second CT studytotal number of “ hospital ” patients (1)The specificity of the model corresponded to a conditional probability P better | home : P better | home = number of “ home ” patients af ter second CT studytotal number of “ home ” patients (2)Positive predictive value (PPV) and negative predictive value (NPV) of a test areaffected by the prevalence of the disease. We used Exponential Smoothing (ETS [10])and Auto-Regressive Integrated Moving Average (ARIMA [11]) models to predict theCOVID-19 incidence in Moscow. The daily data on the total number of COVID-19cases in the city, covering the period from March 6, 2020, to July 20, 2020, was takenfrom the Rospotrebnadzor website [12]. The time series analysis was performed with R3.6.3 [13] using forecast [14] and ggplot2 [15] packages. For the accuracy of modelestimation, we trained the model on the incidence data from March 6, 2020, to June 28,2020, and compared predicted values with the actual values for a period from June 29,2020, to July 20, 2020, using the mean absolute percentage error (MAPE) and meanabsolute scaled error (MASE) metrics.Using the prevalence value, PPV was calculated as follows: P P V = sensitivity ∗ prevalence ( sensitivity ∗ prevalence ) + (1 − specif icity ) ∗ (1 − prevalence ) (3)Similarly, NPV of the test was defined as: N P V = specif icity ∗ (1 − prevalence ) specif icity ∗ (1 − prevalence ) + (1 − sensitivity ) ∗ prevalence (4)3/10 esults Forecasting the COVID-19 prevalence in Moscow
For choosing the forecast model, we separated the COVID-19 incidence data intotraining and testing portions and then applied different EST and ARIMA models to thetraining data. According to the MAPE and MASE values, ARIMA(0,2,1), ETS ZZZ(automatically selected parameters), and ETS ANN (simple exponential smoothing withadditive errors) models provided the best fit for the testing dataset (Table 1).
Table 1.
Accuracy statistics for different forecasting models
Model MAPE MASEARIMA(0,2,1) 0.56 0.66ETS ZZZ 0.56 0.66ETS MMM 1.08 1.28ETS ANN 0.56 0.66
Despite the good fit with the training data, ARIMA(0,2,1), ETS ZZZ, and ETSANN model predicted almost linear accumulation of COVID-19 cases. In contrast, theactual data showed a gradual decrease in daily incidence (Figure 1). The only modelthat followed the trend was ETS MMM (multiplicative trend, error, and season type),and we used it to estimate the number of total COVID-19 cases in Moscow.
Figure 1. Forecasts ofMoscow COVID-19 preva-lence.
Black: actual data; gray:ETS MMM model, light-gray:ARIMA(0,2,1) model. Other mod-els matched the predictions ofARIMA(0,2,1), and are not shownfor clarity. According to the ETSMMM model, the COVID-19incidence tended to 270,000subjects. Note that the modelprovides a very rough forecasthorizon; there might bea second wave of the diseaseand seasonal patterns thatwe can not reliably predictusing the current data. TheFederal State Statistics Servicereported that, as of January 1,2020, the Moscow populationwas 12,678,079 [16]. Fromthese data, the COVID-19prevalence in Moscow,defined as the percentage withthe disease in the populationat risk, would be 2.1%.
Estimating the diagnostic accuracy of CT
47 OCTCs consulted 107,548 patients with a mean age of 42 ±
17 years, 60,539 females,assessed for initial eligibility (Figure 2). 47,297 (48%) subjects showed no pneumonicchanges (CT0 category). We collected the data of second CT examinations performed 94/10 igure 2. Flow diagram of participants through the study. ± Table 2.
Categorization of participants between two consecutive examinations
Disease dynamicsBetter Worse SumGroup according to the initiallyassigned CT category Home 860 48 908Hospital 47 18 65Sum 907 66
We used data on the dynamics of the disease to assess the diagnostic and prognosticvalue of the “CT0-4” grading scale. For that, we considered the transition of a patientfrom the categories CT0-CT2 (“home” group) to CT3 or CT4 (“hospital” group) asdeterioration of the condition (column “Worse” in Table 3). The reverse transitions,from the “hospital” group to the “home” group, we regarded as an improvement in thepatient’s condition. Here we also included the cases in which the patient remainedwithin the “home” group, even when the disease progressed from CT0 to CT2 category(column “Better” in Table 3). According to our calculations, the CT sensitivity was27.7%, specificity 94.7%, PPV 10.1%, and NPV 98.4%. 5/10 iscussion
The objective of this work was to estimate whether the CT method and “CT0-4”grading scheme, in particular, have diagnostic and prognostic value as a test forCOVID-19. Since RT-PCR tests, while being a “gold standard” for COVID-19 clinicaldiagnosis, have several disadvantages [5–7], we developed an alternative approach toassessing the characteristics of CT as a test for the disease. According to our results, thegrading scheme demonstrates low sensitivity and PPV, but high specificity and NPV.The role and value of CT in COVID-19 diagnostics have sparkled debates in themedical community [17, 18]. Chest CT has a low rate of missed COVID-19diagnoses [19] and positively correlates with mortality in patients with COVID-19pneumonia [20]. On the other hand, CT does not test for the specific virus, andtherefore its results could be misleading. Moreover, the experience and competence levelof a radiologist plays an essential role in the correct recognition of COVID-19—specificCT patterns. Depending on the person interpreting the CT images, the specificity ofthe test could be as low as 7% and as high as 100% (see Table S1).When developing the “CT0-4” grading system and the method of itsimplementation, we aimed to reduce the role and likelihood of human error in theinterpretation of CT images. For this, we have introduced reference templates to assistthe identification of both the presence and severity of the disease. Along with that, weestablished the MRRC, one of the tasks of which was to validate the decisions ofradiologists at the OCTCs. Therefore, evaluating the diagnostic value of chest CT forCOVID-19, we evaluate the efficiency of this entire structure.Since CT is unable to distinguish between viruses and RT-PCR has several flawsthat make it a sub-optimal reference standard, we monitored the clinical state for aconvenience sample of 973 patients to address the correctness of the diagnosis. Notethat the “CT0-4” grading scale allows a radiologist to simultaneously make a diagnosisand a decision about the necessity of hospitalization, thus having predictivefunctionality. Because of that, we focused on the issue of whether the CT0-4 categoryprovides a reliable prognosis for the disease progression in order to assess the diagnosticvalue of the approach.Our results show that the “CT0-4” grading system is highly specific and predictivefor identifying admissions of COVID-19 patients to hospitals. Out of 908 patientsinitially assigned to the categories CT0-CT2, only 48 (5.3%) progressed to the CT3 andCT4 categories, providing specificity 94.7% and NPV 98.4% for the approach. Of these48 patients, only four (0.4%) deteriorated to the CT4 category (two each from theinitially assigned CT1 and CT2 categories). These numbers indicate the ability of thetest to optimize the burden on hospitals due to reliable and efficient triage decisions.We associate the low sensitivity and PPV values of the test with the features of thesample and the methodological setup. Due to the small number of patients in the“hospital” group, even a slight change here can significantly affect the final value. Forexample, just one additional case in the “Worse” column (see Table 2) would result in asensitivity value being 29.3% instead of 27.7%. Moreover, we considered a rapidimprovement of the patient’s condition as a marker of a false-positive test result,although it could be a true-positive event. The median time to recovery for COVID-19patients is estimated to be 20.8 days, but in individuals aged 50 –70+ years, it is longer(22.6 days), further lengthening for those with severe symptoms (28.3 days) [21]. In ourstudy, the observation period was 9 ± Conclusions “CT0-4” grading scale demonstrated high sensibility for COVID-19 and was associatedwith the prognosis for the disease progression. These features make chest CT a rapidand effective diagnostic test for the disease during the pandemic, which simultaneouslyallows radiologists to make reliable triage decisions. The approach has its limitationsrelated to the safety and availability of CT and human error factor in CT imagesinterpretations. Substantial infrastructure capabilities, effective administrative decisions,and competent human resources are required to make its implementation appropriate.A network of OCTCs controlled by MRRC in combination with the “CT0-4” gradingscale proved to be an effective tool for routing and management of COVID-19 patients,which was able to optimize the burden on hospitals.
Acknowledgments
The authors express their gratitude to all doctors of medical organizations of theMoscow Department of Health, fighting the epidemic, and to the team of experts fromthe Moscow Department of Information Technologies for prompt assistance in workingwith data from UMIAS-ERIS.
References
1. “Coronavirus update (live).” .Accessed: 2020-07-03.2. World Health Organization, “Coronavirus disease 2019 (COVID-19): situationreport, 46,” technical documents, 2020-03-06.3. L. J. Carter, L. V. Garner, J. W. Smoot, et al. , “Assay techniques and testdevelopment for COVID-19 diagnosis,”
ACS Cent Sci , vol. 6, no. 5, pp. 591–605,2020. PMID: 32382657.4. G. D. Rubin, C. J. Ryerson, L. B. Haramati, et al. , “The role of chest imaging inpatient management during the COVID-19 pandemic: a multinational consensus7/10tatement from the Fleischner society,”
Radiology , vol. 296, no. 1, pp. 172–180,2020. PMID: 32255413.5. L. M. Kucirka, S. A. Lauer, O. Laeyendecker, D. Boon, and J. Lessler, “Variationin false-negative rate of reverse transcriptase polymerase chain reaction–basedSARS-CoV-2 tests by time since exposure,”
Ann Intern Med , vol. 0, no. 0, p. null,0. PMID: 32422057.6. T. Ai, Z. Yang, H. Hou, et al. , “Correlation of chest CT and RT-PCR testing forcoronavirus disease 2019 (COVID-19) in China: a report of 1014 cases,”
Radiology , vol. 296, no. 2, pp. E32–E40, 2020. PMID: 32101510.7. K. Danis, O. Epaulard, T. Benet, et al. , “Cluster of coronavirus disease 2019(COVID-19) in the French Alps, 2020,”
Clin. Infect. Dis. , Apr 2020.8. X. Xie, Z. Zhong, W. Zhao, C. Zheng, F. Wang, and J. Liu, “Chest CT fortypical coronavirus disease 2019 (COVID-19) pneumonia: relationship to negativeRT-PCR testing,”
Radiology , vol. 296, no. 2, pp. E41–E45, 2020. PMID:32049601.9. S. P. Morozov, A. E. Andreychenko, N. A. Pavlov, et al. , “MosMedData: ChestCT scans with COVID-19 related findings dataset,” 2020. arXiv:2005.06465[cs.CY].10. R. Hyndman and Y. Khandakar, “Automatic time series forecasting: The forecastpackage for R,”
J Stat Softw , vol. 27, no. 3, pp. 1–22, 2008.11. G. Box, G. Jenkins, G. Reinsel, and G. Ljung,
Time Series Analysis: Forecastingand Control . Wiley Series in Probability and Statistics, Wiley, 2015.12. “Rospotrebnadzor.” . Accessed:2020-07-20.13. R Core Team,
R: a Language and Environment for Statistical Computing . RFoundation for Statistical Computing, Vienna, Austria, 2020.14. R. Hyndman, G. Athanasopoulos, C. Bergmeir, et al. , forecast: Forecastingfunctions for time series and linear models , 2020. R package version 8.12.15. H. Wickham, ggplot2: elegant Graphics for Data Analysis . Springer-Verlag NewYork, 2016.16. “Federal Statistics Office.” . Accessed:2020-07-20.17. M. C. K. Hamilton, S. Lyen, and N. E. Manghat, “Controversy in coronaViralImaging and Diagnostics (COVID),” Clin Radiol , vol. 75, pp. 557–558, Jul 2020.18. S. Morozov, N. Ledikhova, E. Panina, et al. , “Re: Controversy in coronaViralImaging and Diagnostics (COVID),”
Clin Radiol , forthcoming 2020.19. Y. Li and L. Xia, “Coronavirus Disease 2019 (COVID-19): Role of Chest CT inDiagnosis and Management,”
AJR Am J Roentgenol , vol. 214, pp. 1280–1286, 062020.20. S. Morozov, V. Gombolevskiy, V. Chernina, et al. , “Prediction of lethal outcomesin COVID-19 cases based on the results of chest computed tomography,”
Tuberculosis and Lung Diseases , vol. 98, no. 6, pp. 7–14, 2020. 8/101. Q. Bi, Y. Wu, S. Mei, et al. , “Epidemiology and transmission of COVID-19 in 391cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohortstudy,”
Lancet Infect Dis , Apr 2020.22. C. Herman, “What makes a screening exam ”good”?,”
AMA J. Ethics (prev.Virtual Mentor) , vol. 8, no. 1, pp. 34–37, 2006.23. Y. Fang, H. Zhang, J. Xie, M. Lin, L. Ying, P. Pang, and W. Ji, “Sensitivity ofchest CT for COVID-19: comparison to RT-PCR,”
Radiology , vol. 296, no. 2,pp. E115–E117, 2020. PMID: 32073353.24. Z. Wen, Y. Chi, L. Zhang, et al. , “Coronavirus disease 2019: initial detection onchest CT in a retrospective multicenter study of 103 Chinese subjects,”
Radiology: Cardiothoracic Imaging , vol. 2, no. 2, p. e200092, 2020.25. M. Callaway, S. Harden, W. Ramsden, et al. , “A national UK audit for diagnosticaccuracy of preoperative CT chest in emergency and elective surgery duringCOVID-19 pandemic,”
Clin Radiol , in press 2020.26. H. X. Bai, B. Hsieh, Z. Xiong, et al. , “Performance of radiologists indifferentiating COVID-19 from non-COVID-19 viral pneumonia at chest CT,”
Radiology , vol. 296, no. 2, pp. E46–E54, 2020. PMID: 32155105. 9/10 upporting Information
Table S1.
Sensitivity and specificity of CT for COVID-19 according to different sources