Analyzing the spatial distribution of acute coronary syndrome cases using synthesized data on arterial hypertension prevalence
AAnalyzing the spatial distribution of acutecoronary syndrome cases using synthesized dataon arterial hypertension prevalence (cid:63)
Vasiliy N. Leonenko − − − ITMO University, 49 Kronverksky Pr., St. Petersburg, 197101, Russia [email protected]
Abstract.
In the current study, the authors demonstrate the methodaimed at analyzing the distribution of acute coronary syndrome (ACS)cases in Saint Petersburg using the synthetic population approach anda statistical model for arterial hypertension prevalence. The cumulativenumber of emergency services calls in a separate geographical area (a gridcell of a map) associated with ACS is matched with the assessed numberof dwellers and individuals with arterial hypertension, which makes itpossible to find locations with excessive ACS incidence. The proposedmethod is implemented in Python programming language, the visualiza-tion results are shown using QGIS open software. Three categories oflocations are proposed based on the analysis results. The demonstratedmethod might be applied for using the statistical assessments of hiddenhealth conditions in the population to categorize spatial distributions oftheir visible consequences.
Keywords:
Acute coronary syndrome · Arterial hypertension · Syn-thetic populations · Statistical modeling · Python.
Acute coronary syndrome (ACS) is a range of health conditions assosiated witha sudden reduced blood flow to the heart. This condition is treatable if diagnosedquickly, but since the fast diagnostics is not always possible, the death toll ofACS in the world population is dramatic [5]. The modeling approach for fore-casting the distribution of ACS cases would allow the healthcare specialists to bebetter prepared for the ACS cases, both in emergency services and in stationaryhealthcare facilities [2]. The most simple forecast could be introduced by the ap-plication of the statistical analysis to the retrospective EMS calls data associatedwith acute heart conditions. However, if the corresponding time series datasetis not long, the accurate prediction is impossible without using additional datarelated to the possible prerequisites for acute coronary syndrome calls, such ashealth conditions that increase the risk of ACS. (cid:63)
This research is financially supported by The Russian Science Foundation, Agree-ment a r X i v : . [ q - b i o . Q M ] A p r V. N. Leonenko
One of the factors in the population which might be associated with acutecoronary syndrome is arterial hypertension (or, shortly, AH) — a medical con-dition associated with elevated blood pressure [11]. Arterial hypertension is oneof the main factors leading to atherogenesis and the development of vulnerableplaques whose instability or rupture are responsible for the development of acutecoronary syndromes [8]. Thus, we might assume that the neighborhood, whichis populated predominantly by individuals with AH, might demonstrate highervulnerability to ACS. Based on that assumption, it might be possible to usespatially explicit AH data as an additional predictor of prospective ACS cases.Unfortunately, the data on AH prevalence with the geographical matching arerarely found, and for Russian settings, they are virtually non–existent.In this paper, we describe methods and algorithms to analyze the distributionof ACS–associated emergency medical service calls (shortly, EMS calls) usingsynthesized data on arterial hypertension prevalence. Using Saint Petersburg asa case study, we address the following question: may the synthesized AH datacombined with EMS calls dataset provide additional information connected withACS distribution in the population, compared to absolute data and relative dataon EMS calls alone?
Fig. 1.
The daily dynamics of emergency service calls connected with acute coronarysyndrome (Jan – Nov, 2015)
The EMS data we used in the research contain 5125 EMS calls from Januaryto November 2015 in Saint Petersburg connected with acute coronary syndrome[6]. The back–of–the–envelope analysis of the time series corresponding to dailynumber of calls (Fig. 1) and the weekly EMS calls distribution (Fig. 2) did not nalyzing the spatial distribution of acute coronary syndrome . . . Weekday0100200300400500600700800
Fig. 2.
The cumulative number of ACS emergency service calls in different weekdays. F r e q u e n c y ( l o g ) Fig. 3.
The spatial distribution of ACS emergency service calls in Saint Petersburg V. N. Leonenko reveal any statistically significant patterns connected with distribution of callsover time, although it is clear from the data that the number of EMS calls hasa decline in the weekends. Thus, there is no straightforward prediction methodto forecast fluctuations of the cumulative number of daily EMS calls connectedwith ACS.The spatial distribution of calls for the whole time period based on the ad-dresses from the database is shown in Fig. 3. The histogram for cumulativedistribution was built by calculating the total number of EMS calls in a givenspatial cell with the size 250 m x 250 m, with empty cells (0 EMS calls) excludedfrom the distribution. It was established that the form of the histogram does notchange significantly if the cell sizes vary (up to 2 km x 2 km). It can be seen thatthe predominant majority of the spatial cells had 1 to 5 EMS calls, and onlyfor single cells this number exceeds 8. Based on general knowledge, we assumedthat the increased concentration of the EMS calls within particular cells may becaused by one of the following reasons: – The cell has higher population density compared to the other cells; – The cell has higher concentration of people with arterial hypertension, whichmight cause higher ACS probability; – The cell includes people more prone to acute coronary syndrome due tounknown reasons.To distinguish these cases and thus to be able to perform a more meaning-ful analysis of EMS calls distribution, we assess the spatial distribution of citydwellers and people with high blood pressure using the synthetic populationapproach.
A “synthetic population” is a synthesized, spatially explicit human agent database(essentially, a simulated census) representing the population of a city, region orcountry. By its cumulative characteristics, this database is equivalent to the realpopulation, but its records does not correspond to real people. Statistical andmechanistic models built on top of the synthetic populations helped tackle a va-riety of research problems, including those connected with public health. In thisstudy, we have used a synthetic population generated according to the standardof RTI International [10].According to the standard of RTI International, the principal data for anygiven synthetic population is stored in four files: people.txt (each record con-tains id, age, gender, household id, workplace id, school id), households.txt (contains id and coordinates), workplaces.txt (contains id, coordinates andcapacity of the workplaces), and schools.txt (contains id, coordinates, capac-ity). Our synthetic population is based on 2010 data from “Edinaya sistemaucheta naseleniya Sankt Peterburga” (“Unified population accounting systemof Saint Petersburg”) [3], which was checked for errors and complemented bythe coordinates of the given locations. The schools records were based on the nalyzing the spatial distribution of acute coronary syndrome . . . school list from the official web–site of the Government of Saint Petersburg [4].The distribution of working places for adults and their coordinates were derivedfrom the data obtained with the help of Yandex.Auditorii API [12]. The detaileddescription of the population generation can be found in [7]. P r o b a b ili t y Probability_MaleProbability_Female
Fig. 4.
The cumulative distribution function used to define the AH status of an indi-vidual, based on data from [9].
When the synthetic population is created, we assess the health conditions ofindividuals associated with arterial hypertension. There are two types of corre-sponding data that we generate and add to the individual records of the syntheticpopulation: – The AH risk (the probability of having arterial hypertension). Based on [9],we assumed that the mentioned probability depends on age and gender ofan individual. The corresponding cumulative distribution function was foundusing the data of 4521 patients during 2010–2015 and is shown in Fig. 4. – The actual AH status (positive or negative). The corresponding value (0or 1) is generated by the Monte Carlo algorithm according to the AH riskcalculated in the previous step. The AH status might be used in simulationmodels which include demographic processes and population-wide simulationof the onset and development of AH.The proportion of the synthetic population affected by arterial hypertensionis found to be 26.6 % which roughly correlates with the AH prevalence data inthe USA according to American Heart Association Statistical Fact Sheet 2013Update (1 out of every 3) and is lower than the estimate for Russia of the maincardiologist of Ministry of Health of Russian Federation (43%). The cumulativeand spatial distributions of AH+ individuals in Saint Petersburg are shown in
V. N. Leonenko
Fig. 5. It can be seen, that the age and gender heterogeneity in the popula-tion is enough to create uneven distribution of individuals exposed to arterialhypertension.Further in the paper we match the number of AH+ dwellers of every cellwith the number of EMS calls within this same cell and propose an indicator toanalyze the relation between them.
We convert the coordinates of EMS calls location from degrees to meters usingMercator projection. After this we form a grid with a fixed cell size (250m × numpy , matplotlib , and pandas . The output of the algorithm is atxt-file with the coordinates of the cells and the cell statistics (overall numberof individuals, number of AH+ individuals, overall number of EMS calls).In order to understand the relationship between the numbers of AH+ usersand the number of EMS calls, we follow our earlier research [1], where the ratio r between the overdose–related EMS calls and the assessed number of opioid drugusers was studied. In this paper, we compare r with the alternative indicator r which uses the number of people in overall in the cell under study instead ofthe assessed quantity of AH+. The formulas to calculate the following ratios arethe following: r = n ems + 1 n ah + 1 and r = n ems + 1 n p + 1where n ems is the number of registered EMS calls in a cell, n ah is the model–predicted number of AH+ users in a cell, and n p is the number of persons whodwell in a cell based on the synthetic population. These quantities representratios of calls per AH+ individual and calls per dweller, respectively. By addingones to the numerator and denominator we are able to avoid a divide by zeroerror, and although it provides a small skew in the data, its consistent applicationacross all cells leaves the results and their interpretations unhindered. We usethe ratio r to understand which cells have large differences in the orders ofmagnitude compared to other cells. The ratio r is introduced to compare itsdistribution with r and thus decide whether the statistical model for AH+probability assessment helps more accurately detect the anomalies connectedwith EMS calls distribution. In Fig. 6, the aggregated distributions of the r and r values for our data areshown. On the left graph, the distributions are given in their original form, nalyzing the spatial distribution of acute coronary syndrome . . . F r e q u e n c y Fig. 5.
The aggregated and geospatial distributions of AH+ individuals in Saint Pe-tersburg V. N. Leonenko F r e q u e n c y ( l o g ) EMS indicators r1 (EMS calls to AH+ individuals)r2 (EMS calls to all individuals) 8 6 4 2 0 2 410 F r e q u e n c y ( l o g ) Standardized EMS indicators r1 (EMS calls to AH+ individuals)r2 (EMS calls to all individuals)
Fig. 6.
The distributions of r and r (original and standardized). and in the right one the standardized distributions are demonstrated, i.e. withmeans equal to 0 and standard deviations equal to 1. Although the shape of thehistograms is similar, the difference between the corresponding distributions isstatistically significant, which is supported by the results of Chi–square test per-formed for the standardized samples. The crucial difference is in the histogramtails, i.e. in the extreme values of the indicators, which, as it will be shownfurther in the paper, is also accompanied by their different spatial distribution. In Fig. 7, a distribution of 20 cells with the highest values of r and r is shown(shades of blue and shades of green correspondingly). The lighter shades corre-sponds to the bigger cell side lengths (250, 500, 1000 and 2000 meters).The results demonstrate that the locations of high r values change less withthe change of cell side length, compared to r (it is represented in the map asseveral points with different shades of blue situated one near another). Also itis notable that the high r values were found in lined up adjacent cells (see leftand right edges of the map). This peculiarity of r distribution requires furtherinvestigation, because it hampers the meaningful usage of the indicator.The locations marked with three blue points represent concentration of highEMS calls in the isolated neighborhood with few assessed number of AH+ in-dividuals. Most of these locations happen to be near the places connected withtourism and entertainment (1 – Gazprom Arena football stadium, 2 – Peterhofhistorical park) or industrial facilities (3 – bus park, trolleybus park, train depot;4 – Izhora factory, Kolpino bus park). Location 5 corresponds to Pulkovo airport,a major transport hub (it is marked by only two blue points though). Location6 is the one which cannot be easily connected with excessive EMS calls — it issituated in a small suburb with plenty of housing. The possible interpretation ofwhy it was marked is the discrepancy between the actual number of dwellers for2015 (a year for EMS calls data) compared to the 2010 information (a year forpopulational data). This zone was a rapidly developing construction site whichcaused the fast increase in number of dwellers. Location 7 is also an expectational nalyzing the spatial distribution of acute coronary syndrome . . . Fig. 7.
Points of high r and r Fig. 8.
Heatmap of EMS calls matched against high r and r locations0 V. N. Leonenko one – it is the only one which is marked by three green points (high r ). Addi-tionally, this zone was not marked by high r , although it is easily interpretedas yet another industrial district (Lenpoligraphmash printing factory). Increas-ing the number of points in a distribution to 100 does not change significantlythe results: still blue points mark isolated areas with meaningful interpretation(except Lenpoligraphmash at location 7 which is still solely marked by green).Whereas the exceptional values of r indicate isolated non–residential areas(industrial objects and places of mass concentration of people) which mightbe connected with the increased risk of ACS and thus require attention fromhealthcare services, the extreme values of r indicator might come in handy in thesituation when we need to assess the excess of EMS calls in the densely populatedresidential areas. In Fig. 8, where r and r values are plotted against a heatmapof EMS call numbers, we see that there are two types of peak concentrations ofEMS calls (bright red color). Ones are not marked with green dots (the r valuesare not high) and thus might be explained by high concentration of dwellers ingeneral. Others, marked with green dots, show the locations with high numberof EMS calls relative to population. In case when there is no corresponding high r value, these spots might correspond to the category of neighborhoods withACS risk factors not associated with arterial hypertension (to be more precise,not associated with the old age of dwellers, since it is the main parameter of thestatistical model for AH prevalence used in this study). In this paper, we have demonstrated a statistical approach which with usessynthetic populations and statistical models of arterial hypertension prevalenceto distinguish several cases of ACS–associated EMS call concentration in theurban areas: – High r values for any corresponding number of EMS calls (Fig. 7) mightindicate areas where acute coronary syndrome cases happen despite the lowAH+ population density and thus require attention from the healthcare or-gans. – Average to low r values for high number of EMS calls (Fig. 8, red spotswithout green points) correspond to areas with high population density. – High r values and low r values for high number of EMS calls (Fig. 8, redspots with green points) might indicate areas where the excessive number ofACS cases cannot be explained neither by the high population density, norby AH prevalence, thus they might indicate neighborhoods with unknownnegative factors.It is worth noting that due to the properties of our EMS dataset (see Section2.1 and Fig. 3) most of the locations with extremely high r and r correspondto the number of EMS calls in a grid cell equal to 1. Ascribing EMS calls to oneor another property of the area based on such a small number of observations isdefinitely premature, and thus our interpretations given earlier in the text should nalyzing the spatial distribution of acute coronary syndrome . . . be continuously tested using the new data on EMS calls. Despite the fact thatwe cannot draw any definite and final conclusions, in the author’s opinion, thestudy successfully introduces the application of the concept of using synthesizeddata related to mostly unobserved health conditions in the population (arte-rial hypertension) to categorize spatial distribution of their visible consequencesthat require immediate medical treatment (acute coronary syndrome). As it wasdemonstrated by the authors before [1], the same approach can be successfullyused in case of opioid drug usage, and we expect to broaden the scope of itsapplication by applying it in another domains.As to the current research, we plan the following directions of its furtherdevelopment: – Currently, the time periods of the EMS calls information and synthetic pop-ulation data does not match which might cause the bias in the estimatedvalues of the indicators. We plan to actualize both datasets and to establishwhether the results which are demonstrated in this study will be reproduced. – The enhanced statistical model for AH is required to make the calculationof the number of AH+ individuals more accurate. – The values of r are almost the same for the cases of (a) 1 EMS call inpresence of 0 AH+ individuals, and (b) 2 n calls in presence of n AH+ in-dividuals, so those cases cannot be distinguished by using indicators suchas r , although they are essentially different. We want to explore the pos-sibility of using a yet another indicator which will take into account theabsolute number of dwellers in the neighborhood and will have a meaningfulinterpretation. – We have access to a number of health records of the people hospitalizedwith ACS in a human–readable format, which contains information abouttheir AH status. Using natural language processing tools, we plan to obtaina digital version of this dataset and consequently to assess numerically theconnection between AH and ACS cases in Saint Petersburg. This result willhelp reduce uncertainty in the results of the current study connected withanalyzing the distribution of r . References
1. Bates, S., Leonenko, V., Rineer, J., Bobashev, G.: Using synthetic populationsto understand geospatial patterns in opioid related overdose and predicted opi-oid misuse. Computational and Mathematical Organization Theory (1), 36–47(2019)2. Derevitskiy, I., Krotov, E., Voloshin, D., Yakovlev, A., Kovalchuk, S.V., Karbovskii,V.: Simulation of emergency care for patients with ACS in Saint Petersburg forambulance decision making. Procedia Computer Science , 2210–2219 (2017)3. Government of Saint Petersburg: Labor and employmentcommittee. information on economical and social progress.[online], http://rspb.ru/analiticheskaya-informaciya/razvitie-ekonomiki-i-socialnoj-sfery-sankt-peterburga/ (In Russian)2 V. N. Leonenko4. Government of Saint Petersburg: Official web-site. [online],
5. Jan, S., Lee, S.W., Sawhney, J.P., Ong, T.K., Chin, C.T., Kim, H.S., Krit-tayaphong, R., Nhan, V.T., Itoh, Y., Huo, Y.: Catastrophic health expenditureon acute coronary events in asia: a prospective study. Bulletin of the World HealthOrganization (3), 193 (2016)6. Kovalchuk, S.V., Moskalenko, M.A., Yakovlev, A.N.: Towards model-based policyelaboration on city scale using game theory: application to ambulance dispatch-ing. In: International Conference on Computational Science. pp. 404–417. Springer(2018)7. Leonenko, V., Lobachev, A., Bobashev, G.: Spatial modeling of influenza outbreaksin Saint Petersburg using synthetic populations. In: International Conference onComputational Science. pp. 492–505. Springer (2019)8. Picariello, C., Lazzeri, C., Attana, P., Chiostri, M., Gensini, G.F., Valente, S.: Theimpact of hypertension on patients with acute coronary syndromes. Internationaljournal of hypertension (2011)9. Semakova, A., Zvartau, N.: Data-driven identification of hypertensive patient pro-files for patient population simulation. Procedia Computer Science , 433–442(2018)10. Wheaton, W.D., Cajka, J.C., Chasteen, B.M., Wagener, D.K., Cooley, P.C., Gana-pathi, L., Roberts, D.J., Allpress, J.L.: Synthesized population databases: A USgeospatial database for agent-based models. Methods report (RTI Press) (10),905 (2009)11. WHO: Hypertension. Fact sheet. [online],
12. Yandex: Auditorii. [online],12. Yandex: Auditorii. [online],