[PDF] A Case Study in Model Failure? COVID-19 Daily Deaths and ICU Bed Utilisation Predictions in New York State

Abstract

Forecasting models have been influential in shaping decision-making in the COVID-19 pandemic. However, there is concern that their predictions may have been misleading. Here, we dissect the predictions made by four models for the daily COVID-19 death counts between March 25 and June 5 in New York state, as well as the predictions of ICU bed utilisation made by the influential IHME model. We evaluated the accuracy of the point estimates and the accuracy of the uncertainty estimates of the model predictions. First, we compared the "ground truth" data sources on daily deaths against which these models were trained. Three different data sources were used by these models, and these had substantial differences in recorded daily death counts. Two additional data sources that we examined also provided different death counts per day. For accuracy of prediction, all models fared very poorly. Only 10.2% of the predictions fell within 10% of their training ground truth, irrespective of distance into the future. For accurate assessment of uncertainty, only one model matched relatively well the nominal 95% coverage, but that model did not start predictions until April 16, thus had no impact on early, major decisions. For ICU bed utilisation, the IHME model was highly inaccurate; the point estimates only started to match ground truth after the pandemic wave had started to wane. We conclude that trustworthy models require trustworthy input data to be trained upon. Moreover, models need to be subjected to prespecified real time performance tests, before their results are provided to policy makers and public health officials.

Full PDF

AA Case Study in Model Failure? COVID-19 Daily Deaths andICU Bed Utilisation Predictions in New York State

Vincent Chin , Noelle I. Samia , Roman Marchant , Ori Rosen , John P.A.Ioannidis , Martin A. Tanner , and Sally Cripps ARC Centre for Data Analytics for Resources and Environments, Australia School of Mathematics and Statistics, The University of Sydney, Australia Department of Statistics, Northwestern University, USA Department of Mathematical Sciences, University of Texas at El Paso, USA Stanford Prevention Research Center Department of Medicine, Stanford University, USA Department of Epidemiology and Population Health, Stanford University, USA Department of Biomedical Data Sciences, Stanford University, USA Department of Statistics, Stanford University, USA Meta-Research Innovation Center at Stanford (METRICS), Stanford University,USA * Corresponding author: [email protected] +61 425-276-967June 30, 2020

Abstract

Forecasting models have been inﬂuential in shaping decision-making in the COVID-19 pan-demic. However, there is concern that their predictions may have been misleading. Here, wedissect the predictions made by four models for the daily COVID-19 death counts betweenMarch 25 and June 5 in New York state, as well as the predictions of ICU bed utilisation madeby the inﬂuential IHME model. We evaluated the accuracy of the point estimates and the ac-curacy of the uncertainty estimates of the model predictions. First, we compared the “groundtruth” data sources on daily deaths against which these models were trained. Three diﬀer-ent data sources were used by these models, and these had substantial diﬀerences in recordeddaily death counts. Two additional data sources that we examined also provided diﬀerent deathcounts per day. For accuracy of prediction, all models fared very poorly. Only 10.2% of the pre-dictions fell within 10% of their training ground truth, irrespective of distance into the future.For accurate assessment of uncertainty, only one model matched relatively well the nominal95% coverage, but that model did not start predictions until April 16, thus had no impact onearly, major decisions. For ICU bed utilisation, the IHME model was highly inaccurate; thepoint estimates only started to match ground truth after the pandemic wave had started towane. We conclude that trustworthy models require trustworthy input data to be trained upon.Moreover, models need to be subjected to prespeciﬁed real time performance tests, before theirresults are provided to policy makers and public health oﬃcials.

Keywords:

COVID-19; Hospital Resource Utilisation; Model Evaluation; Uncertainty Quantiﬁca-tion 1 a r X i v : . [ q - b i o . P E ] J un Introduction “I don’t have a crystal ball. Everybody’s entitled to their own opinion, but I don’t operatehere on opinion. I operate on facts and on data and on numbers and on projections.” [13] New York Governor Andrew Cuomo - March 24, 2020 “Now, people can speculate. People can guess. I think next week, I think two weeks,I think a month, I’m out of that business because we all failed at that business. Right?All the early national experts. Here’s my projection model. Here’s my projection model.They were all wrong. They were all wrong.” [8] New York Governor Andrew Cuomo -May 25, 2020Forecasting has been very inﬂuential in the COVID-19 pandemic. Dealing with a new virus andwith a lot of uncertainties surrounding its eventual impact, policy makers have widely used anddepended upon predictions made by various models. These predictions refer to critical issues suchas the number of anticipated deaths with and without diﬀerent interventions and the number ofhospital beds, ICU beds, and ventilators that would be needed to deal with the surge of the epidemicwaves. There is concern that while these models are useful, they can also be very misleading ([17],[14], [16]). It is important to understand their performance and their limitations and to try to learnfrom their failures. This may help generate some better standards for the construction, validation,and use of these models.In this article, we evaluate four models for predicting the daily death counts attributable toCOVID-19 for the period March 25 to June 5 for the state of New York (NY), as well as one earlymodel that predicted ICU bed utilisation in NY. The models evaluated are those constructed bythe Institute of Health Metrics and Evaluations (IHME) [15], Youyang Gu (YYG) [12], the Univer-sity of Texas at Austin (UT) [28], and the Los Alamos National Laboratory (LANL) [18]. Thesemodels were chosen because they provide daily death count predictions, as well as 95% predictionintervals (PIs) for each prediction. The IHME model began producing forecasts from March 25,the corresponding dates for YYG, UT and LANL are April 2, April 14 and April 16, respectively.We evaluate these models based on two criteria. The ﬁrst criterion is the accuracy of the pointestimates and the second criterion is the accuracy of the uncertainty estimates of those predictions.With regard to accuracy of prediction, we do not ﬁnd a model that distinguishes itself from thepack. Most concerning, across models only 10.2% of the predictions fall within 10% of their trainingground truth, irrespective of distance into the future. For accurate assessment of uncertainty, theLANL model had observed coverage most closely matching the nominal 95% coverage. Unfortu-nately, the LANL model did not commence predictions until April 16, approximately a month after the ﬁnal US state declared a state of emergency and eleven days after the ﬁnal US state enteredlock-down, thus it played no role in the initial major decisions made by key policy makers in NY, aswell as Washington DC. Regarding the prediction of ICU bed utilisation, the single model (IHME)was highly inaccurate and the point estimates only started to match ground truth by early May,after the pandemic wave had started to wane. Two major takeaways from this research are that1.

Serious thought and investment should be made in quality data collection when itcomes to COVID-19 daily death data, as well as COVID-19 resource utilisation .2.

Models need to be subjected to real time performance tests, before their resultsare provided to policy makers and public health oﬃcials. In this paper, we provideexamples of such tests, but irrespective of which tests are adopted, they need tobe speciﬁed in advance, as one would do in a well-run clinical trial . In order to evaluate the models, it is necessary to deﬁne the actual ground truth number of dailydeaths. This task is more problematic than it would ﬁrst appear, as there is no one source of groundtruth. The models YYG and LANL use the raw daily death counts in NY reported by the JohnsHopkins University (JHURD) [5] for training, IHME uses daily deaths reported by the New York2imes (NYT) [25], while UT uses NYT data until May 5 to train their model before switching toanother version of the JHU data which is known as the JHU time series (JHUTS) data [6]. TheJHUTS data is an update of the JHURD to correct for reporting errors. Many modellers (e.g. [12])view the JHURD data as the gold standard, though [18] raise concerns with the JHURD data, aswell.Figure 1 presents the ground truth data for the state of NY as reported by JHURD (red),JHUTS (dark blue), NYT (green), as well as two additional sources: CovidTracking [24] (black)and USAFacts [27] (light blue), from March 15 to June 5. The point of this ﬁgure is to demonstratethat these ground truths can vary substantially from each other and have features which are artefactsof the way in which deaths are reported. Of particular note is the early lag between JHURD andJHUTS, as well as the more smoothed process presented by the NYT data. It is noted that the NYTdata are the conﬁrmed COVID-19 cases from January 22 to May 6, while from May 7 on-wards,the NYT data include the conﬁrmed and probable cases using criteria that were developed by localand states government ([26]). Both JHURD and JHUTS include conﬁrmed and probable cases. Inall sources, the actual ground truth number of daily deaths is calculated by taking the diﬀerence ofcumulative deaths.Figure 1 also shows evidence of large swings in the number of reported daily deaths, againprobably due to lags or corrections in reporting. Indeed, the JHUTS data show a negative valuefor the number of deaths in NY on April 19. This negative value is, in turn, a result of updatesto the JHURD to correct reporting errors ([7]). This value is clearly incorrect. However, withoutinformation regarding the details of cases at the individual level, it is not possible to correct thesedata. Any attempt to smooth the data raises the question of how the choice of smoothing techniquemay aﬀect any conclusions drawn from the data and [19] raise numerous concerns in this regard. Insummary, coding, counting and reporting COVID-19 deaths is highly complex and is beyond thescope of this paper (see e.g. [20]). Accordingly, we have chosen to work with the data provided bythese sources and will evaluate each model according to the data the developers have chosen to usefor training purposes, as well as to all three ground truths, namely, NYT, JHURD and JHUTS.Figure 1: A comparison of the daily death counts ground truth from CovidTracking (black), JHURD(red), JHUTS (dark blue), NYT (green) and USAFacts (light blue) for the period March 15 to June5 for NY. 3

Accuracy of the Point Estimates

Figure 2 shows the actual data used to develop each of the models, as well as the time series offorecasts made by each of the models. One can see the spike in the number of deaths reported bythe NYT in early May following the inclusion of probable cases. Figure 2 displays only the pointestimates of the forecasts. The time series of forecasts are colour-coded such that the earliest/latestforecasts are at the red/blue end of the colour spectrum. For example, the deepest red curves arethe time series of forecasts made in late March, the yellow curves are time series of forecasts made inearly April and so on, until the violet curves which represent the most recent time series of forecasts.To evaluate each of the forecast time series in Figure 2, we computed two metrics for eachforecast. These metrics are the mean absolute percentage error and the maximum absolute percent-age error, whereby the percentage error is computed from (ground truth - predicted value)/(groundtruth) × . For the mean absolute percentage error, the percentage of discrepancy between thegiven model’s prediction and the ground truth was computed for each model for each day for eachtime series. This information was then averaged over the entire duration of the forecast for a partic-ular time series. The maximum absolute percentage error was computed by taking the maximum ofthe absolute percentage errors for each forecast and for each model. For example, the ﬁrst forecastmade by IHME was on March 25 and we compare the forecast time series for the period from March25 until June 5 with the ground truth time series over that same period by calculating the twometrics discussed above. We then repeat the process for each date a forecast time series was issued,and for each of the models. To make a fair comparison on dates where the forecast time series wasmade by at least two of the models, we truncate the forecast time series at the last prediction dateof the shortest time series. These values are plotted over time in Figure 3 for each version of theground truth.In Figure 3, we see that while some models may perform better or worse over subsets of the timeframe of interest, no one model clearly dominates throughout with respect to either of the metricsfor any version of the ground truth data.Figure 2: The forecast time series made by each model, along with the ground truth (black) usedto train each model. The UT model uses the NYT data (black) until May 5 before switching to theJHUTS data (grey), whereby the negative value for the daily deaths on April 19 (see Figure 1) isremoved before the model is trained. 4igure 3: Discrepancies between each model and the ground truth, as measured by the maximumabsolute percentage error (top) and the mean absolute percentage error (bottom), for each versionof the ground truth. We now turn to the subject of uncertainty quantiﬁcation of these models. Each of the modelsprovides estimates of uncertainty, where the IHME, YYG and LANL forecasts give 95% PIs, whilethe UT provides 90% PIs for predictions made prior to the forecast date of May 16. To translatethese 90% PIs to 95% PIs for the UT model, we take the log of the prediction and the 90% PIs;calculate the diﬀerence between the log of the prediction and the log of the 90% PI limits andmultiply this diﬀerence by a factor of 1.96/1.64. We recompute the 95% PIs on the log scale beforetransforming them back to the original scale.Figure 4 presents plots of the 95% PIs for various predictions made by the models and thetraining ground truth. We follow [19] and deﬁne a k - step-ahead prediction and PI for a particulardate, to be the prediction and accompanying PI made k days in advance of that date. For example,for June 3, a 1-step-ahead prediction and PI are the prediction and accompanying interval for June3 made on the forecast date of June 2, while the 2-step-ahead prediction and PI for June 3 wouldbe made on the forecast date of June 1, etc. The columns of Figure 4 relate to the number ofstep-ahead predictions ranging from 1 to 7 in panel 4a and from 8 to 14 in panel 4b, while the rowsof Figure 4 correspond to the diﬀerent models.Figure 4 shows a number of interesting features. First, as documented in [19], the IHME modelundergoes a number of dramatic changes in the calculation of the prediction and the correspondingPIs. The original IHME model underestimates uncertainty and 45.7% of the predictions (over 1- to14-step-ahead predictions) made over the period March 24 to March 31 are outside the 95% PIs.The IHME model was revised on April 2 and made no predictions on April 1 and April 2. In therevised model, for forecasts from of April 3 to May 3 the uncertainty bounds are enlarged, andmost predictions (74.0%) are within the 95% PIs, which is not surprising given the PIs are in theorder of 300 to 2 000 daily deaths. Yet, even with this major revision, the claimed nominal coverageof 95% well exceeds the actual coverage. On May 4, the IHME model undergoes another majorrevision, and the uncertainty is again dramatically reduced with the result that 47.4% of the actual5 a) 1- to 7-step-ahead predictions.(b) 8- to 14-step-ahead prediction Figure 4: Diﬀerent step-ahead predictions (black dots) by each model and their 95% PIs (gray bars),along with the ground truth (red dots) used to train each model.6aily deaths fall outside the 95% PIs – well beyond the claimed 5% nominal value. It is concerning,nevertheless, that the uncertainty estimates of the IHME model seem to improve with the forecasthorizon, so that for the original model and latest IHME model update, more observed values fallwithin the 95% PIs for the 7-step-ahead prediction than for the 1-step-ahead prediction.Second, Figure 4 shows that the YYG model does not perform well in terms of uncertaintyquantiﬁcation, as there are many more actual deaths lying outside the 95% PIs than would beexpected. Taken across the entire time period, the proportion of actual deaths lying outside the95% PIs is 31.1%. We do, however, note that this percentage improves over time. From the forecastdate of May 1 on-wards, the fraction of actual deaths lying outside the 95% PIs ranges from 28.6%for the 1-step-ahead prediction to 13.6% for the 14-step-ahead prediction, in comparison to 44.8%for the 1-step-ahead prediction to 48.3% for the 14-step-ahead prediction prior to this date.Similarly, regarding the UT and LANL forecasts, neither has observed coverage consistentlymatching the 95% nominal coverage as shown in Figure 5 (ﬁrst plot in the top panel). For the UTmodel, the fraction of actual deaths lying outside the 95% PIs ranges from 14.6% for the 1-step-ahead prediction to 67.5% for the 14-step-ahead. The corresponding ﬁgures for the LANL are 40.0%for the 1-step-ahead prediction to 9.1% for the 14-step-ahead.The second, third and fourth plots in the top panel (one for each version of ground truth) inFigure 5 present this information from a slightly diﬀerent perspective. In particular, from this pointof view, for the 5-14 step-ahead predictions, LANL had the best observed coverage compared to thenominal 95% level, with short-term predictions tending to overestimate the ground truth. The PIsfor the UT model are seen to deteriorate for predictions out into the future, with a tendency of themodel to underestimate the daily number of deaths. The remaining two models, YYG and IHME,tended to provide daily death prediction PIs that systematically miss the nominal 95% coveragelevel, irrespective of distance into the future.Examining the last panel in Figure 5, with the exceptions of IHME evaluated on JHURD andLANL evaluated on the NYT ground truth, no model had more than 30% of daily death predictionsfalling within 10% of the ground truth, with the UT having virtually no predictions within a 10%Figure 5: Percentage of the number of daily deaths within, above and below the k -step-ahead 95%PIs. The last panel shows the percentage of k -step-ahead predictions which fall within 10% of theground truth. 7ound of the ground truth, out into the future. When evaluated on their training ground truth,only 10.2% of the predictions fall within 10% of their training ground truth, irrespective of distanceinto the future. We now turn our attention to ICU bed utilisation in New York State. The only model that providesearly daily predictions and PIs for NY ICU bed usage is the IHME model and we limit our attentionto this model. The IHME model was very inﬂuential in early decision-making at the highest levelsof the United States government, in regard to the allocation of resources for ICU bed usage, havingbeen mentioned at White House Press conferences, including March 31, 2020 [1].Figure 6 presents the IHME estimates (black) and 95% PIs (grey) for ICU bed usage in NY,together with the ground truth (red) and the maximum ICU capacity (blue; inclusive of non-COVID-19 ICU beds) obtained from THE CITY [23]. Each subplot in the ﬁgure corresponds to a day forwhich a prediction was made. For example, the ﬁrst subplot is for the prediction made by IHMEon March 25, the second for the prediction made on March 29, and so on until the last predictionmade on June 5. The prediction intervals start at the date for which the prediction was made, andthus the gray shaded area, which represents the PIs, shifts to the right for subsequent subplots.Figure 6 shows that the early forecasts of ICU bed utilisation were highly inaccurate–the pre-diction intervals for ICU beds made on March 25 through March 31 for the ICU bed usage on April1 did not contain the actual value despite the width of these PIs being in the order of 5 000 to15 000. Over this period, the model seriously over-predicted the ICU bed usage. However, by thethird week in April through early June, the point predictions of the IHME model systematically underestimated

ICU bed utilisation. In fact, in late April, the model predicts zero bed usage bymid-May.Figure 6: Predicted ICU bed usage (black) and its 95% PIs (grey shaded area) in NY for eachreporting date, along with the ground truth (red) and the maximum ICU capacity inclusive ofnon-COVID-19 ICU beds (blue) obtained from THE CITY.8

Conclusion

In a major crisis like COVID-19, policy makers and public health oﬃcials need to operate on “facts,data and numbers”, but this can be diﬃcult when these facts, data and numbers are highly error-prone. In the case of daily COVID-19 deaths in New York, there was serious disagreement evenbetween sources regarding the ground truth for the number of deaths. A key take-away from ouranalysis is that serious thought and investment must be made in quality data collectionwhen it comes to COVID-19 daily death data, as well as COVID-19 resource utilisation

Clinical trial methodology ([11]) for data quality control must be brought to bear, especially whenthe consequences of policy decisions can so dramatically impact the lives of millions of people.Early on, Dr. Anthony Fauci, NIAID Director, stated that [4]: “As I’ve told you on the show,models are really only as good as the assumptions that you put into the model. But when you startto see real data, you can modify that model...” An open question raised by this thoughtful commentis how can one expect quality predictions, when the data are suspect? How does one modify themodel in light of the data, if the data are faulty? Would the course of action of policy leaders havediﬀered, had it been known that there would not be clear agreement on what represented groundtruth for even a hard endpoint such as death? Clearly, if the data are suspect, projections may alsobe sub-optimal.However, putting the issue of data quality aside, our analysis shows that models tended to havevery poor performance both in terms of accuracy as well as in terms of capturing uncertainty. Tobe fair, pandemics are, thankfully, rare events and predicting outcomes in the early stages is verydiﬃcult, as so much is unknown. Rosen [21] quotes Dr. Alain Labrique “With a new virus, and any type of infectious disease that we have never encounteredbefore, there are many unknowns,” said Dr. Alain Labrique, an associate professor atThe Johns Hopkins Bloomberg School of Public Health and one of the nation’s mostrenowned epidemiologists.” And so the big challenge for us is to focus on telling thepublic the truth about what we know, and to explain the uncertainties around what wedon’t know.”

In this regard, the LANL model was the only model that was found to approach the 95% nominalcoverage, but unfortunately this model was unavailable at the time Governor Cuomo needed to makemajor policy decisions in late March 2020. Model predictions for daily deaths tended to have smallererrors over time, but this is not reassuring, because predictions are extremely critical in the earlyphase of an epidemic wave.The importance of accurate early predictions applies even more to predictions for bed utilisation,where wrong expectations can lead to wrong decisions. For example, a major mistake in New Yorkwas the decision to send COVID-19 patients to nursing homes. Based on a March 25 directive,over 4 500 COVID-19 patients were discharged from hospitals to nursing homes ([9]), speciﬁcallybecause it was anticipated that regular hospital beds would be urgently needed and hospitals wouldbe overrun by COVID-19 patients. Nursing homes are full of highly vulnerable people and outbreaksin nursing homes ([2]) resulted in high fatalities. In New York alone, over 5 800 deaths occurredin nursing homes ([9]). Eventually this was a sizeable fraction of the COVID-19 death burden,and importantly, it might have been avoidable to a large extent. Overestimates of anticipatedbed requirements could also have aﬀected hospital utilisation for other serious conditions withadverse consequences for the outcomes of patients suﬀering from these conditions ([10], [22]). Whilepreparedness is important and beneﬁcial, making preparations with vastly erroneous expectationscan create major harm.Our second key take-away from this evaluation: the need for real time evaluation of pre-diction models . Going forward there needs to be industry standards as to how models are tobe evaluated and calibrated in real time, especially in the rapidly evolving settings of a pandemic.Quoting Dr. B. Jewell: “This appearance of certainty is seductive when the world is desperate toknow what lies ahead” [3]. Unfortunately, in retrospect, COVID-19 anxiety can turn to COVID-19disillusionment when the decisions made by policy makers are dictated by suspect low quality dataand consequently by poorly performing models. One solution would be to compare predictions ofmodels against emerging reality on a daily basis using prespeciﬁed metrics such as those analysed9ere. Models that are consistently poorly performing should carry less weight in shaping policyconsiderations. Models may be revised in the process, trying to improve performance. However, im-provement of performance against retrospective data oﬀers no guarantee for continued improvementin future predictions. Failed and recast models should not be given much weight in decision makinguntil they have achieved a prospective track record that can instil some trust for their accuracy.Even then, real time evaluation should continue, since a model that performed well for a givenperiod of time may fail to keep up under new circumstances.

References [1] White House press brieﬁng on U.S. coronavirus response, March 31, 2020. Retrieved from: . Last accessed: June7, 2020.[2] H. R. Abrams, L. Loomer, A. Gandhi, and D. C. Grabowski. Characteristics of US NursingHomes with COVID-19 Cases.

Journal of the American Geriatrics Society , 2020. doi: https://doi.org/10.1111/jgs.16661 .[3] S. Begley. Inﬂuential Covid-19 model uses ﬂawed methods and shouldn’t guide U.S. poli-cies, critics say, April 17, 2020. Retrieved from: .Last accessed: June 7, 2020.[4] P. Bump. The lesson of revised death toll estimates shouldn’t be that dis-tancing was an overreaction,

The Washington Post , April 9, 2020. Re-trieved from: .Last accessed: June 7, 2020.[5] Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. Re-trieved from: https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports_us . Last accessed: June 7, 2020.[6] Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. Re-trieved from: https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_US.csv . Last ac-cessed: June 7, 2020.[7] Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. Re-trieved from: https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/README.md . Last accessed: June 7, 2020.[8] S. Cohen. “We All Failed”–The Real Reason Behind NY GovernorAndrew Cuomo’s Surprising Confession,

Forbes , May 26, 2020. Re-trieved from: . Last accessed: June 7, 2020.[9] B. Condon, J. Peltz, and J. Mustian. AP count: Over 4,500 virus patientssent to NY nursing homes, May 22, 2020. Retrieved from: https://apnews.com/5ebc0ad45b73a899efa81f098330204c . Last accessed: June 7, 2020.[10] O. De Filippo, F. D’Ascenzo, F. Angelini, P. P. Bocchino, F. Conrotto, A. Saglietto, G. G.Secco, G. Campo, G. Gallone, R. Verardi, et al. Reduced rate of hospital admissions for ACSduring Covid-19 outbreak in Northern Italy.

New England Journal of Medicine , 2020. doi: https://doi.org/10.1056/NEJMc2009166 .[11] L. Friedman, C. Furberg, D. DeMets, D. Reboussin, and C. Granger.

Fundamentals of ClinicalTrials, 5th edition . Springer, 2015. 1012] Y. Gu. Retrieved from: https://covid19-projections.com . Last accessed: June 7, 2020.[13] G. Herbert. Cuomo refutes Trump, insists NY needs up to 40,000ventilators: ‘I operate on facts’,

Post-Standard , March 27, 2020.Retrieved from: . Last accessed: June 7, 2020.[14] I. Holmdahl and C. Buckee. Wrong but Useful–What Covid-19 Epidemiologic Models Can andCannot Tell Us.

New England Journal of Medicine , 2020. doi: http://doi.org/10.1056/NEJMp2016822 .[15] IHME COVID-19 health service utilization forecasting team and C. J. Murray. Forecastingthe impact of the ﬁrst wave of the COVID-19 pandemic on hospital demand and deaths forthe USA and European Economic Area countries. medRxiv , 2020. doi: https://doi.org/10.1101/2020.04.21.20074732 .[16] J. P. Ioannidis. Coronavirus disease 2019: The harms of exaggerated information and non-evidence-based measures.

European Journal of Clinical Investigation , 50(4):e13222, 2020. doi: https://doi.org/10.1111/eci.13222 .[17] N. P. Jewell, J. A. Lewnard, and B. L. Jewell. Predictive mathematical models of the COVID-19 pandemic: Underlying principles and value of projections.

JAMA , 323(19):1893–1894, 2020.doi: https://doi.org/10.1001/jama.2020.6585 .[18] LANL COVID-19 Team. Retrieved from: https://covid-19.bsvgateway.org . Last accessed:June 7, 2020.[19] R. Marchant, N. I. Samia, O. Rosen, M. A. Tanner, and S. Cripps. Learning as We Go – AnExamination of the Statistical Accuracy of COVID-19 Daily Death Count Predictions. arXivpreprint arXiv:2004.04734 , 2020.[20] S. Pappas. How COVID-19 Deaths Are Counted,

Scientiﬁc American , May19, 2020. Retrieved from: . Last accessed: June 7, 2020.[21] J. Rosen. Explaining Cuomo’s lament: Why coronavirus data modelswere ‘all wrong’. Retrieved from: https://wjla.com/news/nation-world/explaining-cuomos-lament-why-coronavirus-data-models-were-all-wrong . Lastaccessed: June 7, 2020.[22] A. Sud, M. E. Jones, J. Broggio, C. Loveday, B. Torr, A. Garrett, D. L. Nicol, S. Jhanji, S. A.Boyce, P. Ward, et al. Collateral damage: The impact on cancer outcomes of the COVID-19pandemic. 2020. doi: https://doi.org/10.2139/ssrn.3582775 .[23] THE CITY. Retrieved from: https://thecity.nyc/ . Last accessed: June 7, 2020.[24] The COVID Tracking Project. Retrieved from: https://github.com/COVID19Tracking/covid-tracking-data/raw/master/data/states_daily_4pm_et.csv . Last accessed: June7, 2020.[25] The New York Times. Retrieved from: https://github.com/nytimes/covid-19-data . Lastaccessed: June 7, 2020.[26] The New York Times. Retrieved from: https://github.com/nytimes/covid-19-data/blob/master/PROBABLE-CASES-NOTE.md . Last accessed: June 7, 2020.[27] USAFacts. Retrieved from: https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_deaths_usafacts.csv . Last accessed: June 7, 2020.[28] UT Austin COVID-19 Modeling Consortium. Retrieved from: https://covid19-projections.comhttps://covid19-projections.com