Analyzing the Impact of Foursquare and Streetlight Data with Human Demographics on Future Crime Prediction
Fateha Khanam Bappee, Lucas May Petry, Amilcar Soares, Stan Matwin
AAnalyzing the Impact of Foursquare andStreetlight Data with Human Demographics onFuture Crime Prediction
Fateha Khanam Bappee , Lucas May Petry , Amilcar Soares , and StanMatwin , Dalhousie University, Halifax, NS, Canada B3H 4R2, [email protected] , Universidade Federal de Santa Catarina, Florianpolis, Brazil, Memorial University of Newfoundland, St. Johns, NL, Canada Polish Academy of Sciences, Warsaw, Poland
Abstract.
Finding the factors contributing to criminal activities andtheir consequences is essential to improve quantitative crime research.To respond to this concern, we examine an extensive set of featuresfrom different perspectives and explanations. Our study aims to builddata-driven models for predicting future crime occurrences. In this pa-per, we propose the use of streetlight infrastructure and Foursquare dataalong with demographic characteristics for improving future crime in-cident prediction. We evaluate the classification performance based onvarious feature combinations as well as with the baseline model. Ourproposed model was tested on each smallest geographic region in Hal-ifax, Canada. Our findings demonstrate the effectiveness of integratingdiverse sources of data to gain satisfactory classification performance.
Keywords:
Crime Prediction, Data Analytics, Urban Computing, De-mographics, Foursquare
NOTICE:
This is a pre-print of a paper with the same name to appear in the16th International Conference on Data Science (ICDATA20) proceedings and willbe published in Springer Nature - Book Series: Transactions on ComputationalScience & Computational Intelligence.
Crime is one of the well-known social problems that affect the quality of life andslows down the country’s economy. Nowadays, with the advancement of big dataanalytics, exploring diverse sources of data has gained increasing interest andattention that offers better crime analysis and prediction for crime researchers.Besides, identifying crime patterns and trends is of great importance for po-lice and law enforcement agencies. Crime patterns tell us the story about theenvironment, demography, temporality, and how criminals interact with thosefactors. a r X i v : . [ c s . C Y ] J un Fateha Khanam Bappee et al.
In this paper, we address the problem of predicting future crime incidents forsmall geographic areas (i.e., dissemination areas as defined by Statistics Canada)in Halifax, Canada. Traditionally, criminology researchers study and analyze thehistorical crime data by focusing on sociological and psychological theories toobtain crime and criminal behavioral patterns. However, such strategies may in-troduce bias from the theory-ladenness of observation [6]. The literature statesthat in the real world, crime has a mutual relationship with time, space, andpopulation that complicates the researcher’s study more [10]. Several works haveexplored the relationships between criminal activities and socio-economic factors,for instance, educational facilities, ethnicity, income level, unemployment, etc.,as well as human behavioral factors [16,15,8]. Crime rate or crime occurrence pre-diction has received considerable attention in many studies, including [35,29,30].Several studies tried to predict specific types of crime for a specific region ortime by detecting the patterns of that crime [33]. Spatio-temporal pattern playsa vital role in advanced research in crime analysis and prediction [20]. Nowa-days, advanced techniques are applied to detect different crime patterns such asspatio-temporal, demographic, meteorological, and human behavioral patternsfor crime prediction. However, it is challenging to make accurate estimationsfrom diverse data sources due to nonlinear relationships and data dependencies.Most of the studies that presented data-driven crime pattern detection andprediction approaches have focused on mega-cities like Chicago, New York,Greater London, etc. However, the physical characteristics, human impact char-acteristics, and their interactions are totally different for different regions andcities. Therefore, applying those models for predicting crime in a smaller city ischallenging and may lead to different outcomes. Our study aims to build data-driven models for future crime incidents prediction for smaller cities. The mainhypothesis is that the relative scarcity of data, compared to mega-cities, canbe compensated by using non-traditional datasets that can be derived from so-cial media and the Internet-of-Things (IoT) infrastructure of a modern city. Weextract five different categories of features from six different data sources. Wepropose to explore traditional demographics data with commuting features (e.g.,commuting mode and time), IoT-like streetlight poles position data, as well ashuman mobility data with dynamic features from location-based social networks.To the best of our knowledge, employing demographics data with human mo-bility features for future crime prediction is the first attempt for a small citysuch as Halifax, Canada. For model building, we use ensemble learning meth-ods such as Random Forest and Gradient Boosting. We conduct a performancecomparison for all five categories of features. We also compare the predictionresults generated from ensemble learning methods with a baseline method calledDNN-based feature level data fusion [18].In summary, the contributions of this paper are: (i) we propose the use ofstreetlight infrastructure data with demographic characteristics for improvingfuture crime prediction. Its effectiveness is demonstrated in our experimentalevaluation results; (ii) we propose data-driven models to predict future crimeoccurrences in smaller cities. This implies that fewer data points are applicable uture Crime Prediction 3 for training the models; and (iii) we experimentally show the effect of each featuregroup proposed in previous works and this paper on crime prediction, evaluatingthe classification performance of different feature combinations.The rest of the paper is organized as follows. Section 2 reviews the relatedwork. Section 3 provides the details of feature engineering approaches to improvethe prediction accuracy in Halifax. After, in Section 4, the data source, datapreparation activities, and experimental results are presented. Finally, Section 5presents some concluding remarks with future research directions.
The relationship between crime and various factors has been studied in manyscientific and criminology works. Nowadays, researchers can use spatial informa-tion from the real world using Geographic Information Systems (GIS). Likewise,demographic information is easily accessible from different statistical sources.The use of historical facts and temporal dynamics between neighborhoods andcrimes have also been broadly noted in criminology. Researchers have empha-sized the feasible computation solutions for the urban crime after analyzing thefactors related to different categories of crimes and their consequences. We havecategorized the existing work of crime prediction from four aspects: temporaland historical, geographic, demographic and streetlight, and human behavioralaspects.Temporal patterns of crime are learned from sequential crime data by ana-lyzing the structure (various intervals) of temporal resources. Crime rates can beexamined for hours of the day, different days of the week, months, seasons, years,and others. Many researchers have studied how to identify temporal patternsamong criminal incidents [18,12]. Several works also focus on historical infor-mation to predict future crime incidents [34]. In [27,26], the authors presenteda periodic temporal pattern with hourly crime intensity and holiday informa-tion for crime forecasting. A study [33] shows that drunk driving incidents andother criminal incidents occur during Saturday nights, bar game nights close tothe bar, and sports season close to the stadium. This implies that the temporalinfluence of crime may change over geographic regions.Existing works also examined the geographic influence for future crime pre-diction or crime rate estimation [29]. Wang et al. [29] presented a crime rateinference problem for Chicago community areas by utilizing Point-of-Interest(POI) data as well as geographical influence features. Geospatial DiscriminativePatterns (GDPatterns) was introduced in [28] to capture the spatial properties ofcrime. Furthermore, spatial autocorrelation is considered in [34] where the aver-age number of neighbors is calculated for each grid. Besides, the authors in [4,3]found spatial patterns (hotspots) for crime prediction using the Apriori algo-rithm and Localized Kernel Density Estimation (LKDE), respectively. Recently,another study [5] focused on the creation of spatial features to predict crime us-ing geocoding and crime hotspots techniques. As the distributions of crime varyin time and space, several studies have identified spatio-temporal patterns for
Fateha Khanam Bappee et al. crime prediction [13,35]. In [13], the authors investigated a spatio-temporal dy-namic for Break and Entries (BNEs) crime incidents. However, considering thegeographic influence may add a little help for crime prediction as the neighboringcommunity shares similar demographics.Traditional demographic features have been extensively used in many re-search for crime prediction [8,18,6]. In [10], the author applied population den-sity, mean people per household, people in the urban area, people under thepoverty level, and people in dense housing with some other features to detectcommunity crime patterns. A study discovered the association of constructionpermits, foreclosures, etc. with crime tendency [21]. Researchers also exploredresidential stability, number of vacant houses, number of people who are mar-ried or separated, and education [29,6]. However, using only traditional demo-graphic feature is insufficient to understand the implicit characteristics of crimeand criminals. Few works reported the impact of streetlight distributions on thecriminal behavioral pattern and crime prediction. The researchers in 2018 [31]have found an inverse relationship between streetlight density and crime ratesbased on the census block groups in Detroit. In our study, we also consider ex-tracting streetlight features, but for crime incidents prediction. However, due tohuman mobility, a region’s demographic characteristics may change for a shortor long period of time.Human behavioral pattern aims to obtain understanding from human be-havior, mobility, and networks. In [8], the authors investigated the predictivepower of aggregated and anonymized human behavioral data derived from amultimodal combination of mobile network activity and demographic informa-tion. Specifically, footfall or the estimated number of people within each cell isderived from the mobile network by aggregating every hour the total number ofunique phone calls in each cell tower and mapping the cell tower coverage areasto the Smartsteps cells. Similar works have been done by Andrey et al. [7] andTraunmueller et al. [25] for crime hotspots classification and to find a correlationbetween crime and metrics derived from population diversity. In [29], the authorsprofile the crime rate by applying taxi flow data to understand the reflection ofcity dynamics. The authors considered taxi flows as ‘hyperlinks’ in the city toconnect the locations. Each taxi trip recorded pick-up/dropoff time and loca-tion, operation time, and the total amount paid. The taxi flow features indicatehow neighboring areas contribute much crime in the target area through socialinteraction. A data-driven approach is presented in [6] for crime rate predictionthat also considers road network, transportation nodes, and human mobility. Re-cently, crime event prediction for Brisbane and New York are studied in [17,24]using dynamic features extracted from foursquare data. The authors measure theregion’s popularity by determining the total number of observed check-ins in thatregion for a specific time interval. Also, the number of unique users that checkedin to a specific venue and the number of tips users have ever written about thatvenue are counted to measure the popularity, heterogeneity, and quality of theregion. uture Crime Prediction 5
In our study, we proposed a data-driven approach for a smaller city, Halifax,by investigating an extensive set of features from all different aspects. We mainlyfocus on human behavioral aspects, streetlight features, and the traditional de-mographic features for future crime prediction.
Aiming at predicting future crime incidents, we extract features for each dis-semination area (DA). According to Statistics Canada, a DA is the smalleststandard geographic area in their data, which consists of one or more adjacentdissemination blocks [2]. This section is organized as follows. Section 3.1 detailsthe temporal and historical features used in this work. The demographics andstreetlight features are explained in Section 3.2, while Section 3.3 shows thePOI features used in this work. Finally, Section 3.4 shows some human mobilitydynamic features extracted from social networks.
According to criminology research, crime may change over a long period of time(e.g., season) as well as in a short period of time (e.g., day or week) [23]. Thus,the temporal features we extracted are month, day of the week, time interval ina day, and season. We arrange crime records in 8 three-hour time intervals and4 seasons (winter, fall, summer, and spring) for each DA. On the other hand,some research analyzed the relation of future crime incidents with the past crimehistory [34]. Therefore, we calculate crime frequency and crime density for eachregion based on historical crime data. As the area and population sizes aredifferent for different regions, we normalize the crime frequency using the areaand population size to obtain the crime density ( D cr ). D cr ( r ) = CR ( r ) P ( r ) , (1) D cr ( r ) = CR ( r ) A ( r ) , (2)where CR ( r ) addresses the total number of crimes in DA r , P ( r ) is the totalnumber of population in region r , and A ( r ) is the area of that region. We alsocompute the crime distribution based on each season. Researchers have widely used demographic and socioeconomic features for crimerate estimation [29] and crime occurrence prediction [8]. The main demographicfeatures we consider for our study are population density, dwelling characteris-tics, income, mobility, the journey to work, aboriginals and visible minorities,age, and sex. The journey to work features measure two main things: (i) the time
Fateha Khanam Bappee et al. people leave for work and (ii) the primary mode of commute for residents agedmore than 15 years. We consider 6 different measures for the time people leavefor work, such as between 5 a.m. and 5:59 a.m, 6 a.m. and 6:59 a.m., 7 a.m. and7:59 a.m., 8 a.m. and 8:59 a.m., 9 a.m. and 11:59 a.m., and 12 p.m. and 4:59 a.m.For the mode of commute, public transit, walk, bicycle, and other methods areconsidered. Mobility indicates the geographic movement of a population over aperiod of time, for instance, it shows the information if a person moved to thecurrent place of residence or is living at the same place as 1 year or 5 years ago.Besides demographic features, we observe the effect and graveness of street-light distribution on future crime incidents prediction motivated by [31]. Givena dataset of streetlight locations, for each DA, we propose the use of 3 streetlightfeatures: (i) the total number of streetlights, (ii) the streetlight density, and (iii)the average minimum distance between crime data points and streetlight poles.The streetlight density of region r is computed as follows: D st ( r ) = St ( r ) A ( r ) , (3)where St ( r ) denotes the total number of streetlights in DA r . To calculate theaverage minimum distance from crime location to streetlight poles, we use theHaversine distance metric with scikit-learn [22].Figure 1 visualizes the crime (year 2013), population, and streetlight densi-ties by most observable DAs in Halifax. Dark red color indicates high density,and light red indicates low density. The bin sizes for population and streetlightdensities are the same; on the other hand, we choose smaller bin sizes to geta clear picture of crime density. As shown in Figure 1, most of the criminalincidents happen in downtown Halifax. (a) Crime density. (b) Population density. (c) Streetlight density. Fig. 1.
Crime, population density and streetlight density by most observable DAs inHalifax.
In this work, we propose the use of POI features that can be obtained fromlocation-based social networks (e.g., Foursquare). Our extracted POI featuresinclude (i) the total number of POIs, (ii) the POI frequency, and the density for uture Crime Prediction 7 different POI categories. Foursquare identifies 10 major POI categories, such asfood, arts and entertainment, college and university, nightlife spots, outdoors,and recreation, professional and other places, residence, shop, and service event,and travel & transport. The density of each POI category is defined as follows: D c ( r ) = N c ( r ) N ( r ) , (4) D c ( r ) = N c ( r ) A ( r ) , (5)where, N c ( r ) is the total number of POIs of category c in a DA r , N ( r ) is thetotal number of POIs in region r , and A ( r ) is the area of that region. Figure 2shows the POI and check-in count distributions of most observable disseminationareas (DAs) in Halifax. (a) Poi count distribution. (b) Check-in count distribution. Fig. 2.
The total POI and check-in count distributions by most observable DAs inHalifax.
Our study also explores dynamic human mobility data from location-based socialnetworks to find if there is any relation with crime context. Social networks oftenhave their users’ location data, including their visits to different POIs in a city.We extract 10 features for each DA based on the total number of user check-insand check-in frequency for each POI category. Moreover, the check-in count foreach DA at a time interval, the check-in density, region popularity, and visitorcount are also computed. For DA r at time interval t , the check-in density isdefined as follows: D ck ( r, t ) = Ck ( r, t ) Ck ( r ) , (6) D ck ( r, t ) = Ck ( r, t ) A ( r ) , (7) Fateha Khanam Bappee et al. where, Ck ( r, t ) is the number of check-ins in DA r at time interval t, and Ck ( r ) is the total number of check-ins in that region. Visitor count refers tothe number of unique users that visited a DA at time interval t (i.e., regionpopularity). R rp ( r, t ) = Ck ( r, t ) Ck ( t ) , (8)where, Ck ( t ) is the total number of check-in at time interval t for all regions.We extract a total of 153 features for each dissemination area. Among them,we select 65 features that are more relevant to the crime prediction problem.The details of the total features chosen for each category appear in Table 1. Table 1.
Details of the selected features
Feature category Extracted features Selected features Selected feature names
Temporal and historical 12 8 Month, weekday, time interval, season, crime frequency,crime density based on population, crime density based on area,crime density for seasonDemographic 101 32 Population, population density, dwelling characteristics (11)mobility movers, mobility non movers, mobility migrants,mobility non migrants, aboriginals and visible minorities,primary mode of commute for residents (5), journey to work:the time people leave for work (5), low income (3), age and sexStreetlight 3 2 streetlight frequency, streetlight densityFoursquare POI 21 19 Total POI, food count, residence count, nightlife count,arts & entertainment count, college & University count,outdoors & recreation count, professional & other places count,shop & service count, travel & transport count, andthe densities of all POI categories (9)Foursquare dynamic 16 4 Total check-in for each time interval, check-in density, visitor count,region popularity
We conducted experiments to evaluate the effectiveness of the different groupsof features that can be aggregated to crime data for the task of crime predic-tion. In the following sections, we describe the datasets used (Section 4.1), theexperimental setup (Section 4.2), and the achieved results (Section 4.3 and 4.4).
We use crime data provided by the Halifax Regional Police (HRP) department,which includes records for all Dissemination Areas (DAs) in the Halifax Re-gional Municipality (HRM) in Nova Scotia, Canada. Our dataset was extractedfrom the Uniform Crime Reporting (UCR) survey, which was designed to mea-sure the incidence of crime and its characteristics in Canadian society. For ourexperiments, we explore all crime occurrences from 2012 to 2014. The crimeattributes extracted from the dataset include the geographic location, incidentstart time, month, weekday, and UCR descriptions (incident type). We have a to-tal of 201,086 crime observations (excluding invalid and null information), where uture Crime Prediction 9
Table 2.
Details of the datasets
Dataset Source Total data
Historical crime data Halifax Regional Police 201,086Dissemination area data Statistics Canada 599Demographic data Canadian Census Analyser 599Streetlight data Halifax Regional Municipality 42,653Foursquare POI data Foursquare 2,301Foursquare checkin data Foursquare 12,1710 Fateha Khanam Bappee et al.
We run experiments with well-known ensemble learning classifiers, Random For-est (RF) [9] and Gradient Boosting (GB) [14], with scikit-learn [22] in Python.We used randomized grid-search in preliminary experiments for the hyper-parameter optimization of each classifier evaluated. Besides evaluating the effectof each group of features, we compare our results to a DNN-based feature leveldata fusion baseline method [18]. Since the environmental context feature groupused in [18] is unavailable for Halifax, we implement the DNN without thosefeatures. We use the same parameter settings reported in the correspondingpaper for the baseline model, except for the activations of the DNN, which werereplaced by sigmoid functions as they resulted in a better performance. Wetrain the DNN for 300 epochs and select the best test scores. For evaluating theeffectiveness of each feature group, we analyze the accuracy and F-score of theclassifiers. At the same time, for the comparison with the baseline method, wealso report precision, recall, and Area Under the ROC Curve (AUC) scores.We run a 10-fold time-constrained cross validation, similar to what was pro-posed in [11]. This is more appropriate since we guarantee that the records in thetraining set happened before the ones used for testing, and so we are effectivelyusing data from the past to predict the future. We consider a sliding time windowof 2 years, where the first 12 months are taken for training the models, and thesubsequent 12 months are used for testing. Thus, the models are still capable ofcapturing seasonality patterns as the training split always contains a full year ofdata. As our dataset includes three consecutive years of crime records from 2012to 2014, for the first fold, we take all records from January 2012 to December2012 for training, and the test split goes from January 2013 to December 2013.Next, for the second fold, we slide the window one month forward so that thetraining set spans from February 2012 to January 2013, and the test spans fromFebruary 2013 to January 2014. We repeat this process for 10 different folds.
In Table 3, we show the classification results with various feature combinations.We tested the addition of four different groups of features to the Raw dataset(temporal + historic crime) (R): Demographic (D), Streetlight (S), Foursquaredynamic (F), and Foursquare POI (P) features. We compare 12 different modelsby adding all feature categories one by one with the raw features. Our firstmodel is implemented based on the raw features only, named as model MR. Webuilt models MD, MS, MF, and model MP by adding demographic, streetlight,foursquare check-in, and foursquare poi data, respectively, with the raw data.Similarly, by combining two consecutive feature groups with the raw data, webuilt the models MDS, MDF, MDP, MSF, MSP, and model MFP. Finally, modelMA is implemented based on all of the feature combinations. Both RF and GBclassifiers share a similar trend for all models based on classification accuracy andF-score. As GB performs better than RF for all combinations, in our discussions,we only consider the GB method. Model MR, trained only with raw features, uture Crime Prediction 11 is resulting in low accuracy of 59.85% and 64.65% F-score. Such behavior isexpected since criminal behavior is affected by many different variables otherthan simple spatial and temporal factors [36].By analyzing the addition of each group of features individually (top part ofTable 3), the inclusion of demographic features (model MD) exhibits the bestresults, for which GB shows an improvement of almost 10% in accuracy (69.94%)and about 5% in F-score (69.45%) compared to only raw features. Similarly,streetlight features in model MS show an approximate 9% and 4% improvementfor accuracy and F-score, respectively. Demographic variables reveal most of thecharacteristics of different regions, including social and economic factors, whichare commonly correlated with criminality. Likewise, the installment of streetlightpoles that reflects streetlight density feature also considers the same demographicprofile for each corresponding area. Interestingly, Foursquare POI and dynamicfeatures achieve less accuracy individually as compared to demographic andstreetlight features. One of the reasons for this may be that there is missinginformation for places and check-in data for some dissemination areas.
Table 3.
Results for average accuracy and F-score
Features Random Forest Gradient boostingNo. Model R D S F P Accuracy (%) F-score (%) Accuracy (%) F-score (%) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
Models 6 to 11 show the evaluation results for three feature categories com-bination. The accuracy and F-scores are better and almost consistent for allmodels except model MFP. The reason behind this is that all of them containeither demographic or streetlight features except MFP. In model MA, we com-bine all five categories of features. It gives us similar results as Model 6 (MDS),where we added both the demographic and streetlight categories. As Foursquarefeatures do not lead to performance loss while combining others, in our study,we used all feature categories for building a model.
Table 4 reports the accuracy, precision, recall, and AUC score for one of thebest performing ensemble-based models, Model MA with Gradient Boosting(GB-MA) and the baseline DNN model. Our proposed model performs signifi-cantly better than the baseline model based on precision, recall, and AUC scores.Though DNN can handle non-linear relationships and data dependencies amongdifferent sources, it is challenging for the model to perform accurately for smallerdomains or domains that suffer from data scarcity. This is the most likely reasonfor the baseline model to degrade performance. On the other hand, our modelperformed, on average, about 2% worse considering accuracy compared to thebaseline model. This is due to the existence of a label imbalance in some of thetesting folds.
Table 4.
Performance evaluation
Model Accuracy (%) Precision (%) Recall (%) AUC
DNN (baseline) [18]
In this paper, we study a fundamental problem of crime incidents prediction forthe future time interval. We have presented a data-driven approach to see howprediction accuracy can be improved by integrating multiple sources of data.Specifically, we focus on exploring population-centric features with streetlightand Foursquare-based features for each dissemination area in Halifax. Our prob-lem also considers the temporal dimension of the crime profile in depth. Wecompare all 5 categories of feature combinations differently and unitedly. Theresults show that demographic and streetlight features have strong correlationswith crime. Both of them show significant performance improvement for crimeprediction individually and jointly. Though Foursquare data does not outperformdemographic or streetlight data, it presents a satisfactory performance for crimeprediction. Additionally, we compare our best ensemble model (i.e., Model MAwith Gradient Boosting in Table 3) with the DNN-based baseline model. Ourresults show that GB outperforms the DNN baseline for the same groups of fea-tures. Therefore, applying ensemble-based method leads to better performancein predicting future crime for smaller cities, such as Halifax.In the future, we plan to extend this work in multiple directions. We wantto integrate real-time streetlight data (e.g., light temperature, lux level, outagesdetermined by power supply failures, etc.) with the current dataset. Moreover,identifying specific types of crime that might happen in the near future is our im-mediate concern. As it is very challenging to get accurate results for future crime uture Crime Prediction 13 prediction when sufficient data is unavailable, performing domain adaptation andsome form of transfer learning using available data from a big city, would be ad-vantageous. Furthermore, the subject of data discrimination is another crucialconcern for the study that focuses on real-world datasets. Investigating discrim-ination in socially-sensitive decision records is state-of-the-art research to avoidbiased classification learning.
References (4), 12:1–12:31 (2018)7. Bogomolov, A., Lepri, B., Staiano, J., Letouz´e, E., Oliver, N., Pianesi, F., Pent-land, A.: Moves on the Street: Classifying Crime Hotspots Using AggregatedAnonymized Data on People Dynamics. Big Data (2015)8. Bogomolov, A., Lepri, B., Staiano, J., Oliver, N., Pianesi, F., Pentland, A.: Onceupon a crime: Towards crime prediction from demographics and mobile data. CoRR abs/1409.2983 (2014). URL http://arxiv.org/abs/1409.29839. Breiman, L.: Random forests. Machine learning (1), 5–32 (2001)10. Buczak, A.L., Gifford, C.M.: Fuzzy association rule mining for community crimepattern discovery. In: ACM SIGKDD Workshop on Intelligence and Security In-formatics, ISI-KDD ’10, pp. 2:1–2:10. ACM, New York, NY, USA (2010)11. Cerqueira, V., Torgo, L., Mozetic, I.: Evaluating time series forecasting models:An empirical study on performance estimation methods. CoRR abs/1905.11744 (2019). URL http://arxiv.org/abs/1905.1174412. Duan, L., Hu, T., Cheng, E., Zhu, J., Gao, C.: Deep convolutional neural networksfor spatiotemporal crime prediction. In: Proceedings of the 2017 InternationalConference on Information and Knowledge Engineering, IKE ’17, pp. 61–67 (2017)13. Fitterer, J., Nelson, T.A., Nathoo, F.: Predictive crime mapping. Police Practiceand Research (2015)14. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. An-nals of statistics pp. 1189–1232 (2001)15. Graif, C., Sampson, R.J.: Spatial heterogeneity in the effects of immigration anddiversity on neighborhood homicide rates. Homicide Studies (2009). DOI 10.1177/10887679093367284 Fateha Khanam Bappee et al.16. Hojman, D.E.: Inequality, unemployment and crime in Latin American cities.Crime, Law and Social Change (2004)17. Kadar, C., Iria, J., Pletikosa, I.: Exploring Foursquare-derived features for crimeprediction in New York City. In: KDD - Urban Computing WS ’16 (2016). DOI10.1145/123518. Kang, H.W., Kang, H.B.: Prediction of crime occurrence from multi-modal datausing deep learning. PLOS ONE (2017)19. Lemaˆıtre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: A python toolbox totackle the curse of imbalanced datasets in machine learning. Journal of MachineLearning Research (17), 1–5 (2017)20. Leong, K., Sung, A.: A review of spatio-temporal pattern analysis approaches oncrime analysis. International E-Journal of Criminal Sciences (2015)21. Mu, Y., Ding, W., Morabito, M., Tao, D.: Empirical discriminative tensor analysisfor crime forecasting. In: Lecture Notes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011)22. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machinelearning in Python. Journal of Machine Learning Research , 2825–2830 (2011)23. Ratcliffe, J.: The hotspot matrix: A framework for the spatio-temporal targetingof crime reduction. Police Practice and Research (1), 523 (2004)24. Rumi, S.K., Deng, K., Salim, F.D.: Crime event prediction with dynamic features.EPJ Data Science (2018). DOI 10.1140/epjds/s13688-018-0171-725. Traunmueller, M., Quattrone, G., Capra, L.: Mining mobile phone data to investi-gate urban crime theories at scale. In: SocInfo, Lecture Notes in Computer Science ,vol. 8851, pp. 396–411. Springer (2014)26. Wang, B., Yin, P., Bertozzi, A.L., Brantingham, P.J., Osher, S.J., Xin, J.:Deep learning for real-time crime forecasting and its ternarization. CoRR abs/1711.08833 (2017). URL http://arxiv.org/abs/1711.0883327. Wang, B., Zhang, D., Zhang, D., Brantingham, P.J., Bertozzi, A.L.: Deep learningfor real time crime forecasting (2017)28. Wang, D., Ding, W., Lo, H., Morabito, M., Chen, P., Salazar, J., Stepinski, T.:Understanding the spatial distribution of crime based on its related variables usinggeospatial discriminative patterns. Computers, Environment and Urban Systems(2013). DOI 10.1016/j.compenvurbsys.2013.01.00829. Wang, H., Kifer, D., Graif, C., Li, Z.: Crime rate inference with big data. In:Proceedings of the 22nd ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pp.635–644 (2016)30. Wang, P., Mathieu, R., Ke, J., Cai, H.J.: Predicting criminal recidivism with sup-port vector machine. In: International Conference on Management and ServiceScience, MASS 2010 International Conference (2010)31. Xu, Y., Fu, C., Kennedy, E., Jiang, S., Owusu-Agyemang, S.: The impact of streetlights on spatial-temporal patterns of crime in Detroit, Michigan. Cities (2018)32. Yang, D., Zhang, D., Qu, B.: Participatory cultural mapping based on collectivebehavior data in location-based social networks. ACM Transactions on IntelligentSystems and Technology (TIST) (3), 30 (2016)33. Yu, C.H., Ding, W., Morabito, M., Chen, P.: Hierarchical Spatio-Temporal PatternDiscovery and Predictive Modeling. IEEE Transactions on Knowledge and DataEngineering (2016)uture Crime Prediction 1534. Yu, C.H., Ward, M.W., Morabito, M., Ding, W.: Crime Forecasting Using DataMining Techniques. In: 2011 IEEE 11th International Conference on Data MiningWorkshops (2011)35. Zhao, X., Tang, J.: Modeling temporal-spatial correlations for crime prediction.In: Proceedings of the 2017 ACM on Conference on Information and KnowledgeManagement, CIKM ’17, pp. 497–506. ACM, New York, NY, USA (2017)36. Zhao, X., Tang, J.: Crime in urban areas: A data mining perspective. CoRR abs/1804.08159abs/1804.08159