[PDF] A Decade of Mal-Activity Reporting: A Retrospective Analysis of Internet Malicious Activity Blacklists

Abstract

This paper focuses on reporting of Internet malicious activity (or mal-activity in short) by public blacklists with the objective of providing a systematic characterization of what has been reported over the years, and more importantly, the evolution of reported activities. Using an initial seed of 22 blacklists, covering the period from January 2007 to June 2017, we collect more than 51 million mal-activity reports involving 662K unique IP addresses worldwide. Leveraging the Wayback Machine, antivirus (AV) tool reports and several additional public datasets (e.g., BGP Route Views and Internet registries) we enrich the data with historical meta-information including geo-locations (countries), autonomous system (AS) numbers and types of mal-activity. Furthermore, we use the initially labelled dataset of approx 1.57 million mal-activities (obtained from public blacklists) to train a machine learning classifier to classify the remaining unlabeled dataset of approx 44 million mal-activities obtained through additional sources. We make our unique collected dataset (and scripts used) publicly available for further research. The main contributions of the paper are a novel means of report collection, with a machine learning approach to classify reported activities, characterization of the dataset and, most importantly, temporal analysis of mal-activity reporting behavior. Inspired by P2P behavior modeling, our analysis shows that some classes of mal-activities (e.g., phishing) and a small number of mal-activity sources are persistent, suggesting that either blacklist-based prevention systems are ineffective or have unreasonably long update periods. Our analysis also indicates that resources can be better utilized by focusing on heavy mal-activity contributors, which constitute the bulk of mal-activities.

Full PDF

AA Decade of Mal-Activity Reporting: A Retrospective Analysis ofInternet Malicious Activity Blacklists

Benjamin Zi Hao Zhao [email protected] of New South WalesData61, CSIRO

Muhammad Ikram [email protected] UniversityUniversity of Michigan

Hassan Jameel Asghar [email protected] UniversityData61, CSIRO

Mohamed Ali Kaafar [email protected] UniversityData61, CSIRO

Abdelberi Chaabane [email protected] Bell Labs

Kanchana Thilakarathna [email protected] of Sydney

ABSTRACT

This paper focuses on reporting of Internet malicious activity (ormal-activity in short) by public blacklists with the objective of pro-viding a systematic characterization of what has been reportedover the years, and more importantly, the evolution of reportedactivities. Using an initial seed of 22 blacklists, covering the periodfrom January 2007 to June 2017, we collect more than 51 millionmal-activity reports involving 662K unique IP addresses worldwide.Leveraging the Wayback Machine, antivirus (AV) tool reports andseveral additional public datasets (e.g., BGP Route Views and Inter-net registries) we enrich the data with historical meta-informationincluding geo-locations (countries), autonomous system (AS) num-bers and types of mal-activity. Furthermore, we use the initiallylabelled dataset of ≈ ≈

44 million mal-activities obtainedthrough additional sources. We make our unique collected dataset(and scripts used) publicly available for further research.The main contributions of the paper are a novel means of reportcollection, with a machine learning approach to classify reportedactivities, characterization of the dataset and, most importantly,temporal analysis of mal-activity reporting behavior. Inspired byP2P behavior modeling, our analysis shows that some classes of mal-activities (e.g., phishing) and a small number of mal-activity sourcesare persistent, suggesting that either blacklist-based preventionsystems are ineffective or have unreasonably long update periods.Our analysis also indicates that resources can be better utilized byfocusing on heavy mal-activity contributors, which constitute thebulk of mal-activities.

ACM Reference Format:

Benjamin Zi Hao Zhao, Muhammad Ikram, Hassan Jameel Asghar, Mo-hamed Ali Kaafar, Abdelberi Chaabane, and Kanchana Thilakarathna. 2019.A Decade of Mal-Activity Reporting: A Retrospective Analysis of Internet

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].

Malicious Activity Blacklists. In

AsiaCCS ’19: AsiaCCS ’19: ASIA Confer-ence on Computer and Communications Security, July 07–12, 2019, Auckland,New Zealand.

ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/1234567.1234567

Public reports of malicious online activity are commonly used inthe form of blacklists by intrusion detection systems, spam filtersand alike to determine if a host is known for suspicious activity.However very little is known about the dynamics of the reportingof malicious activities. Understanding what has been reported andhow the reported activity evolves over time can be of paramountimportance to help assess the efficacy of blacklist-based threat pre-vention systems. We conduct a longitudinal measurement study ofreporting of malicious online activities (abridged to mal-activities ),over a ten-year period (from January 2007 to June 2017). We definea mal-activity as any activity reported by one or more public datasources (in particular, within blacklists). The actor or entity behindeach mal-activity can be reduced to a combination of IP address,autonomous system (AS) in which the reported IP address residesor the country in which the IP address is located, which we callmalicious hosts. We collect 51.6M mal-activity reports involving662K unique IP addresses worldwide. We use the Internet Way-back Machine [13], reports from antivirus (AV) tools, and severaladditional datasets to obtain historical meta-information about thedata such as geo-location (countries) and AS numbers. We cate-gorize the combined mal-activities from different sources into sixclasses:

Malware, Phishing, Fraudulent Services (FS), Spamming, Ex-ploits, and Potentially Unwanted Programs (PUPs) . The collecteddataset encompasses attributes and historical knowledge of numer-ous malicious hosts from these six classes, providing a wide rangeof possible mal-activities. To foster further research, we release thedataset and scripts used in this paper to the research community: https://internetmaliciousactivity.github.io/

The main contributions of our work are as follows: • We use a machine learning approach to label the entiredataset (51.6M mal-activities) by training a classifier on1.57M labelled reports (obtained from public blacklists). Morespecifically, we train an ensemble of Random Forest classi-fiers on basic report information, such as the IP, day, month We acknowledge that hosts may be infected or victimized to perform mal-activitiesinstead of intentional involvement. This paper does not differentiate between them. a r X i v : . [ c s . CR ] A p r ear, autonomous system number, country code, and organi-zation names to correctly label the type of mal-activity thatmay have been committed. Over a training/testing split of40%/60% on the dataset that contained labels from the source,we are able to achieve an accuracy of 93.5% in re-identifyingthe type of mal-activity committed. The trained model isthen leveraged to predict the mal-activity type of reportsthat were deficient in this information. • We determine that mal-activities of the Malware class have IPaddresses that are hosted in diverse set of hosting infrastruc-tures as well as geo-locations. We observe that on averagethe Malware class, at 88%, is 54% more prevalent in hostinginfrastructures, and 29% more prevalent in geo-locations(countries) than any other other classes of mal-activities. Bycomparison, our analysis reveals that the reports of PUP andSpammer classes of mal-activity are more concentrated with2,200 and 561 unique ASes, respectively (§ 2.5). • We observe that the hosting infrastructure of mal-activitiesare primarily concentrated in US and China. A normalizedview of a country’s IP space revealed the British Virgin Is-lands and Anguilla, with large proportions of malicious IPs;However deeper investigation revealed a country’s IP spacecan be dominated by singular ASes, cautioning the use of acountry’s proportion of malicious IP addresses within the IPspace (§ 3.2). • We analyze the volume of reported mal-activity over time(§ 4.1). We observe that while malware has historically been,and continues to be, the dominant class, starting from 2012,reports of phishing activities have steadily risen, recently be-coming the second largest class in volume (29% as comparedto 59% reports of malware in the year 2017). • We study the periods of “activity” and “inactivity” of hosts (atthe IP, AS and country-level) as proxied by their presence orabsence from the reports modeled by an alternating renewalprocess to capture the churn rates of the reports. We con-sider lifetime (resp. deathtime) distributions for active (resp.inactive) periods. A high average lifetime reflects reportingof persistent threats, while a low average deathtime wouldindicate resiliency to reporting (§ ?? ). Based on this, we ana-lyze the behavior of the different classes of mal-activity andnote that phishing activities are the most resilient with thelowest average deathtimes indicating quick recovery frompotential shutdowns (§ ?? ). • We analyze mal-activity recurrence as the rate at which aparticular mal-activity re-emerges in the reports and observethat countries such as Colombia, Panama, Bahamas, Norwayand Mexico have the highest rates, owing perhaps to weakcyber defense infrastructure, or relaxed regulations (§ ?? ). • We measure the magnitude of reported mal-activities as theaverage volume of occurrences during active periods. Ourresults show that while 86% of the IP addresses and 27%of the ASes are being reported to be involved in a uniquemal-activity per week when they are active, a mere 200 IPaddresses are reported in a massive 10K+ malicious reportedactivities per week (§ 4.2). Our results reveal some surprising observations which indicatethat blacklists-based online prevention systems are either powerlessagainst some persistent threats originating from a small number ofsources, or at best suffer from quite unreasonable updating periods(§ ?? ). Our findings also suggest that phishing is a highly resilientactivity that very likely will not be defeated by blacklists-basedapproaches only (§ ?? ). Finally, we believe that tracking heavymal-activity contributors should be a priority for law enforcementagencies, major network providers and cloud operators as theyclearly constitute the largest chunk of malicious activities threatvector (§ 4.2). Adopting approaches to detect the emergence of suchheavy mal-activity contributors at an early stage is arguably key tosignificantly reducing their impact.The rest of the paper is organized as follows. In Section 2, wesummarize our dataset along with our methodology to augmentand annotate the initial dataset. Section 4 presents our analysisof the behavior of the reporting of malicious activity. We discussrelated work in Section 5 and conclude in Section 6. We initially identified a sizable set of publicly-available blacklistsources (22, from static URLs), which are augmented in two iterativeways: First, we looked up the historical versions of these blacklistsby using the Internet Wayback Machine. The blacklists and theirhistorical versions (ranging from January 2007 to June 2017) giveus a set of IP addresses and malicious domains that have beeninvolved in some malicious activity within that time frame alongwith the corresponding timestamp and activity tag. We call this list

Blacklist-07-17 .As a second step, we submit all IP addresses present in

Blacklist-07-17 to an AV tool aggregator service, VirusTotal (VT) [30]. Wequery VT for all additional reports (again covering the period 2007-2017) it has on those IP addresses and malicious domains thatwe collected from the blacklists. In addition, from VT reports, weextract the list of malicious files associated with the malicious activ-ities. Some of these files are software binary executables containinghard-coded IP addresses, often called referrers . We collected all referrers that correspond to IP addresses in Blacklist-07-17 as wellas all other associated IP addresses and reports found to be carriedout by these referrers.The total reports generated by VT via additional queries, refer-rers’ IP extraction and Blacklist-07-17 constitute our set of maliciousactivities, which we call

VTBlacklist . Figure 1a summarizes our datacollection process. We further augment

VTBlacklist with historicalmetadata consisting of the relevant Autonomous System (AS) andcorresponding listed country in which the IP address resided atthe time of reporting. The resulting augmented data is called the

FinalBlacklist . The metadata augmentation process is displayed inFigure 1b.Next, we describe the above mentioned data collection method-ology in more detail. A more extensive description and detailedstatistics are described in: https://internetmaliciousactivity.github.io/ For instance, Zeus – a Trojan toolkit used for credit card fraud and stealing users’banking details – has a set of hard coded IP addresses, presumably C&C servers. a) Seed Blacklist and VirusTotal Data Collection (b) Meta-Data Figure 1: Dataset Collection and Augmentation

The collection of this seed dataset was initiated by (manually) identi-fying a set of publicly available blacklists commonly used as sourcesthat report a wide range of mal-activities. Data sources that do notcontain timestamps, require a subscription (reporting as a service),or that have a robot.txt policy, restricting automated access (i.e.,crawlers), have been discarded. Historical versions of these black-lists available through the use of the Internet Wayback Machinewere obtained, spanning the period from January 2007 to June 2017,consisting of over 2 million “timestamped” reports of 297,095 uniqueIP addresses. There are data discrepancies between the blacklists,but they minimally contain the timestamp, IP address, and type ofmal-activity in each report. The exact differences and basic statisticsof

Blacklist-07-17 are summarized in Appendix A.1 and Table 5.

Blacklist-07-17 is used here as an initial seed for more extensivemonitoring of malicious activity reports (see next section VTBlack-list). The extension seeks to address problems of dynamic reportgeneration (not captured by Internet Wayback Machine), nichethreat biases, and archive periodicity of the seed blacklists. Theselimitations are detailed in Section 2.6.

VirusTotal (VT) is the largest publicly available aggregator of an-tivirus (AV) products providing scan results from 67 different AVproducts (as of January 21, 2019). We use the report

API [30] toquery VT for domains and IP addresses. This returned the associ-ated aggregated reports from different AV products. As maliciousdomains are likely to change their hosting infrastructure over time or to be simultaneously associated with several hosting infrastruc-tures [36][35], we queried VT to receive all reports associated withevery malicious domain in

Blacklist-07-17 . The returned reportsinclude current and past IP addresses belonging to the domainwith timestamps. Through this process, we identified additionalIP addresses. Similarly, an IP address may host several domainsinvolved in mal-activities [35]. We queried VT with each IP addressin

Blacklist-07-17 as input to obtain a list of (timestamped) reportson malicious domains that are (or were) hosted on the given IPaddress.While it is possible to re-query VT using the obtained list ofIP addresses and domains to extract further reports in a recursivemanner, we did not do so due to rate limits imposed on the APIcalls and the sheer size of obtained reports. Since we queried VTfor every reported activity confirmed by AV products about thelist of IP addresses, the generation of additional historical reportscompensates for the sparsity in time coverage of Blacklist-07-17(see Appendix 2.6). In total, we gathered a list of 662,289 unique IPaddresses corresponding to 51,645,995 reported malicious activitiescollectively called

VTBlacklist . Now, we describe how we enrichedthe reported mal-activity datasets with historical metadata.

We enrich our data to extend the list of attributes in the datasetby linking additional attributes (metadata) including AS numbers(ASNs) and geolocation information. The key is to extract relevanthistorical metadata, consistent with the timestamp of the report.

AS Mapping.

To map IP addresses to the corresponding ASes withhistorical accuracy, we used the BGP Route Views dataset [23]. his dataset consists of daily snapshots of the BGP routing tablecollected between 2007 and 2017. Country Mapping.

We further use MaxMind GeoCity [10] and Pota-roo [21] datasets to map an IP address to its respective country (i.e.,territories under sovereign rule or autonomous entities, e.g. BV.)and country code, and used the Wayback Machine to obtain theirarchived versions for historical mappings. Since these archived ver-sions have “gaps,” we consider the closest available IP-geolocationmapping to the reported mal-activity timestamp. This approxima-tion is further discussed in Section 2.6.

Our augmented

FinalBlacklist is composed of a myriad of mal-activities with 15% (7.6M) originally labeled by their respectivedata sources, and the remaining 85% (44M) unlabeled. To classifyall mal-activities, we employ manual classification of the labeledmal-activities, and leverage machine learning to extend the knownlabels onto the unlabeled dataset. We detail these approaches in thefollowing sections.

Each labeled mal-activity in our dataset is classified into one of 4,918 unique mal-activity labels by their respective data sources. Careful analysis ofthese labels shows that the disparity between labels can be reducedby only considering the end-goal or motivation of the adversary.Based on this observation, each author re-classified each activityinto one of only six classes of labels. The co-authors disagreed on1.07% of the cases, which was resolved using majority voting. Ifconsensus was not reached, the activity was marked as unlabeledand discarded from the labeled dataset. The classes of reported mal-activities are

Exploits, Malware, Fraudulent Services (FS), Spammers,Phishing, and Potentially Unwanted Programs (PUP) . We define thesemal-activities in Appendix B.

Classification of a large num-ber (44M, 85%) of unlabeled mal-activities is a non-trivial task. Oneway is to leverage the VirusTotal request API to retrieve labels.However, due to rate limits imposed by VirusTotal, classifying thisvolume of mal-activities would require an unreasonable amount oftime. Therefore, we decided to use our labeled dataset (7.6M, 15%)to determine if there is sufficient information available that can beused to predict class labels to the unlabeled mal-activities.

Motivation.

To motivate the plausibility of this approach, wehighlight one aspect of the labelled dataset called “specialization.”More precisely, we found that a large proportion of hosts partic-ipate in single class of mal-activity, i.e., specialize in one classof activity, indicating that past involvement in a particular mal-activity class is a good indicator of a future class label. To demon-strate this, for a given host h (IP address) in the labeled dataset,we first compute: p ( h , a ) = h with activity a Total h , where a is one of the six mal-activity classes. We then define a probabilis-tic metric, host specialization , which is based on the distributionof mal-activities by hosts in the labeled dataset. Formally, it isdefined as the normalized Shannon entropy per host h given by S ( h ) = (− (cid:205) a p ( h , a ) log p ( h , a ))/ log k , where k ≤ h and a ranges over the 6 classes of Specialization C D F IP EntropiesASN EntropiesCC Entropies

Figure 2: IPs, ASes, and Countries Specialization in

Blacklist-07-17 dataset. Most IP addresses specialize in a single classof mal-activity. activities. A host highly specializes in a single class of mal-activityif it has a lower value of S ( h ) . From Figure 2, we observe that 80%of the reported IP addresses exclusively participate in one class ofmal-activity. When we expand the definition of a host to includean AS or a country, we observe a more uniform distribution acrossthe hosts, with 55% of ASes and only 20% of countries (CC) partici-pating in one class of mal-activity. Furthermore, only 0.04% (311)of IP addresses, 2.12% (275) of ASes, and 27.4% (54) of countries,participate in all six classes. On closer look, we found that 96.8% ofIP addresses, 87.7% of ASes and 74.4% of countries have a relativeentropy value of less than 0.50. This suggests that a substantialnumber of hosts (IPs, ASes, and Countries) will be biased towardsone class of mal-activities.Following this intuition, we suspect there is sufficient informa-tion within the mal-activity reports for the training of a classifierto predict the report’s mal-activity label. Specifically, if this trainedclassifier has good testing accuracy on our labelled dataset, wecan leverage the classifier to predict the mal-activity label of ourunlabelled reports. Machine Learning Approach to Label Mal-Activities.

As each re-port can be labelled one of the 6 mal-activity labels (§2.4), we es-tablish the task of predicting the mal-activity label as a multi-classclassification problem. We leverage a Random Forest classifier, withour original labelled dataset divided into training (40%) and testing(60%) sets. The labelled

Blacklist-07-17 dataset contains 1,006,171samples of malware, 164,149 of phishing, 60,146 of exploit, 297,652of fraudulent services, 43,582 of unwanted programs, and 2,691samples of spammers. The split of the dataset (into training andtesting sets) is stratified, with a consistent proportion of trainingand testing samples for each mal-activity label. The large numberof reports in the labelled dataset prevented us from using morereports in the training dataset, as the random forest implementa-tion from scikit-learn [51] would encounter memory issues, despitemore than 96 GB of RAM provisioned for the task. As the trainingset is dwarfed by the testing set, we repeated the model trainingand testing process 5 times, each on different training/testing splitsof the data. This repetition ensures our results are not a result of abiased split in the data.Table 1 lists the features used for labelling mal-activities. Wenote that

One-Hot encoding is a common approach of encodingcategorical features, whereby the encoding maps a categorical fea-ture with k categories into k binary vectors. An alternate method s to encode the categorical data as numerals, however, this wouldalso produce a misleading numerical relationship between the cat-egories depending on their order. We have chosen the features ofday, month, year, and IP address (decomposed into octets) as this isthe most basic information available in a mal-activity report. Onthe other hand, AS, country and organization information can beeasily found from the given IP address via a whois lookup.We decompose the IP address into its octets to allow the modelto learn possible \8, \16, \24, \32 relationships, that would otherwisenot be possible with a full 32-bit IP encoding. It is also acknowl-edged that IPs are dynamic in nature, and it has been observed thatmalicious domains hosting malware are transferred to other IPswithin an IP block under a single controlling entity (e.g., hostingprovider such as Amazon) [47]. Therefore in the interest of produc-ing a sufficiently generalized model to handle possible IP changes,we use octets. Feature Data TypeDay integerMonth integerYear integerIP bits (0-7) integerIP bits (8-15) integerIP bits (16-23) integerIP bits (24-31) integerAS integerCountry One-Hot encodingOrganization One-Hot encoding Table 1: Features used in Classification Task

Performance and Prediction of Unlabelled Data.

On 6 Cores of anIntel Xeon E5-2660 V3 clocked at 2.6 GHz and 96GB of memory, thewhole classification process took approximately 15 minutes. Thisincludes loading/splitting the data, training, testing and writingresults to disk. As we have trained 5 models on different splits ofthe original training data, rather than discarding 4 models to onlyuse one, we construct an ensemble of all 5 models (a classifier en-semble). Each of the 5 models provides a prediction, consisting of alabel and associated probability (confidence). From this, the labelwith the highest average probability is assigned to the mal-activityreport. This method of majority voting is known as soft-voting . Theclass-specific accuracies of Malware, Phishing, Exploits, FraudulentServices, PUP, Spammers, averaged over all 5 models is 93.04%,93.85%, 79.04%, 91.70%, 96.29%, 82.57%, respectively. Since the num-ber of samples for each label is uneven, we therefore performed aweighted average over the label-specific accuracies to produce anoverall accuracy, which turned out to be 92.49%.

In Table 2, we report the total number of mal-activities correspond-ing to the six classes, along with the collected metadata. Overall,we collected a total of 51,645,995 mal-activity reports from all datasources (cf. Table 5 in Appendix A.1). With manual labeling and theuse of our random forest machine learning classifier, we categorized44,003,768 (85%) unlabelled reports into six different classes. The result produces malware as the largest mal-activity class (90.9%),and spammers as the smallest (0.01%).Given that 136,941 (20.7%) of IP addresses in the reports are beingreported to be involved in more than one class of mal-activities,the percentage breakdown under each metadata attribute suchas IP address, ASes or geolocation (country) does not add up to100%. We found that the labelled IPs (662,409) host 8.42M, 8.79M,and 948K number of unique domains, URLs, and malicious files(i.e., executables), respectively. We also find that IP addresses thatcorrespond to mal-activities are referenced in 18K malicious files(i.e., referrers).Note that, as an IP-endpoint (such as a Web server) could hostmore than one domain and could have multiple resources (i.e.,URLs), once again, the percentage of number of domains and URLsdoes not add to 100%.

Table 2: Summary of the

FinalBlacklist dataset. “U” denotesunique and “FS” represents Fraudulent Services.

Class

Malware 46,932,466 (90.9%) 427,745 (65%) 11,435 (88%) 196 (99%)Phishing 2,450,247 (4.74%) 133,072 (20%) 4,402 (34%) 139 (70%)FS 1,141,377 (2.21%) 87,508 (13%) 3,264 (25%) 118 (60%)PUP 895,494 (1.73%) 165,465 (25%) 2,200 (17%) 81 (41%)Exploits 218,791 (0.42%) 39,854 (6%) 2,966 (23%) 112 (57%)Spammers 7,620 (0.01%) 2,209 (0.3%) 561 (4%) 60 (30%)Total 51,645,995 (100%) 662,409 (100%) 12,950 (100%) 198 (100%)

Despite our best efforts to collect the most comprehensive set ofdata sources to perform our study, there are still some limitationsworth mentioning.First, a limitation

Blacklist-07-17 is that we did not use somepopular blacklists that we are aware of (e.g., the Spamhaus Project[26] and PhishTank [20]), as the lists in those reporting serviceswere dynamically generated and hence it is very difficult to extracttheir historical versions (the Way Back machine does not archivedynamically generated content). Second,

Blacklist-07-17 might bebiased towards specific or niche threats, e.g., specific focus of theZeus, Spyeye or OpenPhish blacklists (cf. Table 5). Also, WaybackMachine snapshots are sporadic and as a result

Blacklist-07-17 issubject to sparsity in time coverage. This was one of the motiva-tions to feed the initial lists to the VirusTotal service to extract morecomprehensive reports across the whole 2007-2017 period. Finally,the IP-Country mappings described in Section 2.3, are obtainedfrom Wayback Machine archives of Maxmind and Potaroo. Here,we could not recover the exact mapping due to the sporadic natureof Wayback Machine records (as we did for the historical versions ofblacklists using VT Score reporting). Instead, we consider the clos-est available IP-geolocation mapping to the reported mal-activitytimestamp. We acknowledge that accuracy of IP address to locationdatabases may impact our analysis. However, note that databaseaccuracies are questioned at the city and region-levels, but previ-ous research has shown that geolocation databases can effectivelylocate IP addresses at the country-level [53]. CHARACTERIZATION OF MAL-ACTIVITIES

In this section, we analyze whether a few hosts (IP addresses, coun-tries and ASes) are more biased towards specific classes of mal-activities or if the spread is more uniformly distributed. We alsoprovide further insights where a particular mal-activity is skewedtowards a few hosts.

We first study the distribution of IPs over the categories of mal-activities, then analyze the geolocation distribution at both countryand AS levels.

The majority of IP addresses (63.0%) arerepeat offenders with participation in mal-activities reported morethan once as shown in Figure 3a. Among the different classes ofmal-activities, IP addresses corresponding to Fraudulent Services(81.6%) and Malware (65.0%) were the most involved in more thanone corresponding mal-activity. Spammers on the other hand arethe least repeated by an IP address (only 36.4%). Overall, about18.0% of all IP addresses were involved in at least 10 reports ofmal-activity, with an average of 78.0 reports per IP.

Insights.

We observe that is the most reported IPaddress, managed by AS16509 (AMAZON-02 - Amazon.com, Inc.)in the US, which is dominated by the malware class with 43,753reports. This is consistent with reports [41][44] on cybercrimi-nals using, often, free Amazon Web Services (AWS) to host a largevolume of SpyEye Trojans and exploit kits for mal-activities. Simi-larly, we found that is the third most reported IPaddress, managed by AS7415 (Integral Ad Science–a Web ad and an-alytic service) in Canada, primarily due to suspicious ad campaignscomprising of 35,885 unique PUPs. We were unable to determinewhether this IP address is infected with malware. However, ourstudy confirms previous findings [48][47] on cybercriminals us-ing leading ad networks to propagate mal-activities (in this case,Integral Ad Science). Previous research [55] showed that spam-mers often quarantine bots for a period, waiting for them to bewhitelisted again.

Our dataset shows that there is at leastone malicious IP address hosted in almost every country (avg. 4170IP addresses per country). However, Figure 3b indicates that themal-activities are not evenly distributed among countries. The fig-ure shows that mal-activities are a prevalent cybersecurity threatworldwide with 20.2% of countries having more than 10K mali-cious reports, although the distribution varies from one class ofmal-activity to another. Malware is distributed relatively evenlywhilst spammers are concentrated in a few selected countries likeUnited States, Russia, British Virgin Islands, Ukraine, and Germanywith proportions of the spamming activity at 35%, 22%, 9%, 5%, and5%, respectively.

Insights.

Our results agree with the expectation that countrieswith rich IT infrastructure such as US, Germany, China, France,and the Netherlands are dominant in terms of mal-activities (42M,1.47M, 1.32M, 1.24M and 0.41M, respectively). Interestingly, BritishVirgin Islands (VG) is ranked 8th with 243K mal-activities. Out ofthese 80.1% are malware, trailed by FS with 9.6%.

Figure 3c shows the distribution of mal-activities per AS. Majority (82.4%) of the ASes are involved inmore than one mal-activity, with 59.2% of all labelled ASes con-tributing to at least 5 mal-activities. Among the different classes ofmal-activities, Malware is seen in the highest proportion of ASes,specifically in 88.3% of all reported ASes. In contrast, spammers aredistributed over the smallest proportion of ASes, only 4.33%.

Insights.

We note that AS16509 (managed by AMAZON-02 andlocated in US) is the most aggressive with 25.8M of all mal-activities(52.0% of all labeled mal-activities in our dataset). We also observethat it has contributed to all classes of mal-activities, predominantlymalware (24.5M) and phishing (463K). This indicates that cloudservice providers are often preferred by cybercriminals to inflictharm on online services at scale.

To minimize the bias of massive Internet services infrastructures incountries such as the US, we investigate the normalized distributionof mal-activities. To this end, we use the CIPB [4] and AS-Rank [3]datasets to garner the total number of allocated IP addresses percountry and per AS, respectively, and measure the ratio of thenumber of malicious IP addresses to the total allocated IP addressesof a given country or AS.

Figure 4a depicts the CDFs of the numberof mal-activities per country. Overall, the ratio of malicious IPaddresses to the total number of addresses is low, however, thelong-tail of the distributions (the top right) reveal a few countrieswith relatively high proportion of IP addresses participating in mal-activities. In Table 3(a), we list the top 5 countries that correspond tothe tail of the distributions in Figure 4a. The table shows that BritishVirgin Islands (VG) has the highest proportion of IP addressesfollowed by Anguilla, Lithuania, Belize, and Luxembourg.

Insights.

The biggest proponent of mal-activities within theBritish Virgin Islands is AS40034 (with 205K reports), under thecontrol of Confluence-Networks which is a large hosting serviceprovider. The next biggest contributing AS is AS44571, netVillage,a social networking platform provider, which has 2.4K reports. An-guilla’s high proportion of mal-activities are predominantly theresult of HostiServer, controlling AS32338, another content AS.This shows that only a few ASes might drive up the normalizeddistribution of mal-activities in countries with smaller Internetinfrastructures.

Similarly, we plot the normalized distribution ofmalicious IP addresses per AS in Figure 4b, which shows a similartrend to country level analysis, despite all distributions appearingto stretch to both lower and higher percentages. This suggests thatthere are ASes that do not host or rarely host malicious IP addresses,in addition to ASes in which a large ratio of their allocated IP spacehas been observed to partake in mal-activities. Table 3(b) lists thetop 5 ASes with the highest ratio. We observe that AS31624, locatedin the Netherlands (NL) and belonging to VFMNL-AS, has 57.6% ofits IPs participating in 12.3K reports of Mal-Activities correspondingto all six categories of mal-activity. AS44901 (BELCLOUD, Bulgaria)is the second and AS54761 (SAMBREEL-SVCS, United States) is the Volume C D F TotalMalwarePhishingFraud. Serv.PUPExploitsSpammers (a) IPs Volume C D F (b) Countries Volume C D F TotalMalwarePhishingFraud. Serv.PUPExploitsSpammers (c) ASes

Figure 3: Number of mal-Activities per IP, country and ASN. third in the list with 44.5% and 33.7% respectively, participating inmal-Activities.

Insights.

AS31624 is a now defunct Trading and Service De-posit Company. BELCLOUD’s AS44901 is a data center, which hadpreviously routed malicious requests, as detected by BGP RouteViews [23]. Sambreel, is a software services company which de-veloped adware plugins that were later abused by advertisers [24],contributing to a larger space of maliciously marked IP addresses.A shared trait between these ASes is that they have comparativelysmaller IP space, with none of the three exceeding 5,000 allocatedIP addresses. The reader may argue that content ASes, in particular,hosting services, are expected to have a large proportion of theirIP space constantly abused. However, we observe that in all theregistered content ASes, only 5% have more than 1% of their IPSpace marked as malicious. Note further that viewing the propor-tions of IP space marked as malicious does not give the completepicture, as the biggest offenders in terms of volume of mal-activitiesis AS20940 (Akamai International B.V.), and AS14618 (Amazon.com,Inc.), with a proportion of malicious IP space of only 0.49% and1.36% respectively. (a)

CountriesCountry Code (CC) Mal. IPs Total IPs Ratio Vol.

VG 1443 135,030 1.07% 207,125AI 91 10,260 0.89% 222LT 4928 2,690,680 0.18% 36,802BZ 323 178,472 0.18% 1,895LU 14 8,448 0.17% 26,273 (b)

ASesMal. Tot. TotalAS Organization IPs IPs Ratio Vol.

Table 3: Top 5 (a) countries and (b) ASes, with the largestratio of allocated IP space reported for participating in mal-activities. CC Percentage of Malicious IPs C D F TotalMalwarePhishingFraud. Serv. PUPExploitsSpammers (a) Countries AS Percentage of Malicious IPs C D F TotalMalwarePhishingFraud. Serv. PUPExploitsSpammers (b) ASes

Figure 4: The ratio of the number of IPs per country (resp.AS) involved in mal-Activities to total number of allocatedIPs per country (resp. AS).

In this section, we aim to find if (classes) of mal-activities are evenlyspread across hosts (IP addresses, ASes and countries) or are theyconcentrated around a particular hosting infrastructure. We do thisby assessing the “geographical entropy” of mal-activities with adiversity (or homogeneity) metric named affinity based on

Shannonentropy . Affinity.

We define affinity as the normalized entropy per maliciousactivity a as A ( a ) = (− (cid:205) h q ( h , a ) log q ( h , a ))/ log l , where l is thenumber of hosts hosting an activity a , and; q ( h , a ) = h with activity a Total a . Here A ( a ) = a are uniformlydistributed among all hosts and conversely, and A ( a ) = nsights. We observe that at the IP host level, mal-activities arerelatively evenly distributed with Spammers having the highestaffinity (0.820), closely followed by PUP (0.815) and Malware havingthe least (0.691). However, if we look at the AS host level, we see thatsome classes of mal-activities are concentrated around a few ASes.Malware has the lowest affinity (0.260), followed by 0.342 (PUPs),0.458 (Phishing), 0.556 (FS), 0.564 (exploits), and 0.689 (spammers).Digging further, we observe that 80.8% of all mal-activities arecovered by 10 ASes. Likewise, 83.8% of Malware is carried out by 10ASes, and just 10 ASes contribute 84.2% of PUPs. The high affinityof PUPs over IP addresses and low affinity over ASes confirms theobservation in [56] that PUPs are more stable or hosted over bullet-proof infrastructure. Thus the different IP addresses contributingto the PUPs, belong to only a few ASes. One of the reasons for thestability of PUPs is that they are generally in a grey area makingthem semi-legitimate and hence making it difficult to detect themor take them down.At the country level, PUPs exhibit the lowest affinity (0.085) andspammers the highest (0.551). We observe that US alone contributes94.2% of the PUP activity in contrast to its contributions of 35.4%to the spamming. The change in affinity of PUPs from IP addressesto ASes and now to the country level can be explained by the factthat most PUPs (and malware) often rely on pay-per-install (PPI)services that in turn use cloud providers, often located in theUS, to distribute unwanted programs. This has previously beennoted for instance for Amazon [41, 44], Integral Ad Science, andDoubleClick [48]. We argue that mal-activity detection techniquesthat only vet malicious infrastructures would fail to detect andprevent the distribution of such mal-activities. The next contribution of this paper is the temporal analysis ofmal-activity reporting behavior. We start by observing the volumeof each class of reported mal-activity in our dataset over time.Obviously the seed dataset

Blacklist-07-17 corresponds to blacklistswith different time ranges, and therefore might be biased towardsspecific periods of time where a particular mal-activity class wouldbe more aggressively reported than others (cf. § 2.6). However ouruse of VirusTotal across the whole 2007-2017 period is intended toovercome this limitation as we believe the extensive number of AVproducts and their reports would be providing a comprehensivescan of the whole reporting period.Note that we avoid drawing conclusions out of IPs reportedglobally as these are subject to dynamic IP allocation issues (e.g.via DHCP).

We analyze the daily volume of different classes of reported mal-activities in our dataset over time in Figure 5a (note log scale ofy-axis). Perhaps not surprisingly, we observe that reported mal-activities have been steadily increasing in volume over the lastdecade, with an interesting spike around 2008-2009 driven by theinception of high-profile FS and exploit kits. One of the earliestkits was MPack [50], a very popular “user-friendly” exploit kit i.e., hosts that guarantee service even after being detected malicious. PPI services are also used for benign software. introduced in 2006. Typically, MPack included a collection of PHPscripts aiming at exploiting browsers’ security holes and commonlyused programs (e.g., QuickTime).

Time o f M a l - A c t i v i t y ( l o g ) SpammersPUPMalwareFraud. Serv.PhishingExploits TotalDaily (a) Evolution

Time % o f M a l - A c t i v i t y MalwarePhishingExploits Fraud. Serv.PUPSpammers (b) Proportion

Figure 5: Evolution and proportion of mal-activities in thedataset.

Phishing has seen two distinct periods of reporting. First, during2009 and then in 2013 with an increase in the total volume of reportsby two orders of magnitude. This is consistent with a report fromKaspersky Lab in 2013 [9] which points to the growing popularityof digital payment systems attracting unwanted attention fromcybercriminals translated into a dramatic increase in the numberof finance-related attacks. We present the relative volume of thereported mal-activity classes over time in Figure 5b. We see thatmalware continues to dominate the proportion of mal-activities.However, phishing has recently undergone an increase in volume:29% of all mal-activities in the year 2017. In comparison, malwarestands at 59% of all mal-activities for the same year.

Lessons Learned.

Malware is consistently the dominant class overthe years. However, interestingly, starting from around 2016-2017,phishing is emerging as one of the major classes of mal-activities,currently consisting of half the volume of Malware. Notwithstand-ing that data sources may not have immediately reported on novelclasses of mal-activities (lower volume of mal-activities, other thanmalware, in earlier years), the relative volume serves as a reasonableproxy of the evolution of mal-activities reporting behavior. , +1 , , ( ) , Figure 6: Churn Model. K i , c is the total number of reports inthe c th period of activity of host i . Previous research [55] showed that spammers often quarantinebots for a period of time, waiting for them to be “whitelisted” again.Motivated by this, we study the periods of presence of IP addresses,ASes and Countries (all denoted as hosts for simplicity) in the publicreports.

The Host Churn Model.

Consider a malicious ecosystem with n participating hosts, where each host h is either alive (i.e., presentin the system) or dead (i.e., logged off/clean/not reported) at anygiven time t . An active host can be reported one or multiple timesas being malicious (denoted m ). This behavior can be modeled byan alternating renewal process Z i ( t ) for each host h , similar to thepeers churn model in peer-to-peer networks (e.g. Yao et al., [60]): Z i ( t ) = i has received at least one report at time t , and Z i ( t ) = ≤ i ≤ n , and t is in weeks. Our tracesare created by binning the reports into weeks per reported host(recall that host refers to an IP, AS or CC).The model is illustrated in Figure 6 where c stands for the cyclenumber, and durations of host i ’s ON (life) and OFF (death) periodsare given by variables L i , c > D i , c >

0, respectively. Unlikethe model in [60], we empirically evaluate (through our data) alllifetime (i.e., { L i , c } ∞ c = ) and off-time (i.e., { D i , c } ∞ c = ) durations byaveraging over all cycles in our dataset. We denote the averagelifetime as L i and the average deathtime as D i .A high average lifetime would reflect a report of persistent threats(or infection) generally referred to as bulletproof entities, sincetheir involvement in mal-activities is not interrupted for extendeddurations (even after being reported). A low average deathtimeindicates resiliency of the reported host as the mal-activity quicklyrecovers from a potential shutdown. The reciprocal of mean cycleduration is representative of the rate of arrival of a particular host. Itindicates the frequency with which a host participates in, or leaves,a class of mal-activity and is defined as: λ i = L i + D i . Consider ascenario where a malicious host is frequently joining and leaving agroup of reported botnets (i.e., in bursts), then both average lifetimeand average deathtime would be small, and hence λ i would berelatively large.Figure 7 displays the CDFs of mean lifetime, mean deathtimeand reciprocal of mean duration per IP address, ASN and country inthe Blacklists. Figure 7a shows that 86.4% of the IPs are short-livedoffenders with an average duration of just a week. As mentionedearlier, we refrain from drawing conclusions on the time-basedbehavior observed at an IP level due to the very likely dynamic IPallocation over time. At an AS-level we found that 56.5% of the ASesare short-lived with an average of one week duration of presencein the blacklists. This number is drastically reduced to 17.4% forcountries, many of which are small African nations, or island states. The long tails observed in the CDF of mean lifetime in Figure 7aindicate that there are only a few hosts with an extended lifetime.We report the IP addresses, ASes, and countries with the highestlifetimes in Table 4(a). We observe that US has the longest meanlifetime of 511 weeks by a large margin (China is ranked second at55.8 weeks), showing a much higher persistence of reported mal-activity in the US than any other country. Brazil, Canada and theUK are the next most persistent countries with the longest averagelifetime of 54.8, 37.8 and 37.7 weeks, respectively. At an AS-level,the most persistent reported AS is “China Telecom Backbone” with147.0 weeks.Figure 7b and Table 4(b) suggest that while most IP addresseshave a mean deathtime longer than 100 weeks indicating a lowparticipation, the “long head” indicates that only a few IPs arerecurring participants. Again with a focus on the AS and countrylevel, we observed that most ASes and countries are repeat offendersfrom the perspective of blacklist reporting. At the country level,in terms of resiliency (low deathtime), US is ranked first with nodeathtime, followed by Germany (1.50) and British Virgin Islands(1.60). For the rate of arrival, we calculate the reciprocal of meanduration and rank the countries accordingly. Table 4(c) shows thatthe top 5 countries in terms of arrival rates are Colombia, Panama,Bahamas, Norway, and Mexico, and constitute the most recurrentcountries to be reported in mal-activity involvement.We also analyze the churn with respect to mal-activity classes.From Figure 8a, we can observe that exploits tend to have reportswith the lowest mean lifetime (one week), while the rest of themal-activity classes are similar to each other with a heavier concen-tration at longer weekly durations. In terms of resiliency, phishinghas the lowest deathtime (highest resiliency) as shown in Figure8b. Due to lower mean deathtime, phishing also has the highestmean rate of arrival indicated in Figure 8c, implying highly frequent on-off reporting cycles, i.e., reported (in)active behavior. Lessons Learned.

The analysis shows that a small number of hostsexhibit high renewal of mal-activities, indicating their presence ona blacklist has not deterred their activities. The most recurrent IPhas an average report activity cycle of 5.5 weeks. Had this hostbeen blocked by blacklists, it would have been removed from saidlists in less than 5.5 weeks from the first reports. Thus blacklistscan consider longer durations prior to delisting a malicious host.Phishing has been observed with the highest resiliency to periodsof no reporting (on average 54 weeks less than all mal-activitiescombined), again suggesting delisting or their ability to circumventblacklist-based blocking. A overwhelming majority (97.7%) of IPreports cease activities in 2 weeks, with average cycles of 185 weeks,the blacklist provider must tradeoff between potential false positivesof hosts which had only been momentarily infected, or curbing theminority of recurrent hosts.

We define a “severity” metric to quantify the magnitude of thereported activity during active periods of malicious hosts in theblacklists. Formally, severity is defined as the average number ofreports of mal-activities per active cycle as per Figure 6. For host i ,let K i , c denote the total number of reports within the c th period of able 4: Churn Analysis: Top 5 IPs, ASes, and Countries (CC) of Lifetime, Deathtime, and Rate of Arrival. (a) Average Lifetime - LT (Most Persistent) IP LT ASN Organization LT CC LT (b) Average Deathtime - DT (Most Resilient)

IP DT ASN Organization DT CC DT (c) Rate of Arrival - RoA (Most Frequently Active)

IP RoA ASN Organization RoA CC RoA Life Mean Duration (Weeks) C D F IPASNCC (a) Death Mean Duration (Weeks) C D F IPASNCC (b) Mean Rate of Arrival C D F IPASNCC (c)

Figure 7:

Churn Analysis: CDFs of IPs, ASes, and Countries (CC) of Lifetime, Deathtime, and Rate of Arrival.

Life Mean Duration (Weeks) C D F (a) Death Mean Duration (Weeks) C D F TotalSpammersFruad. Serv.MalwarePhishingExploitsPUP (b) Mean Rate of Arrival C D F (c) Figure 8: Churn Analysis: CDFs of Rate of Arrival (Reciprocal of mean duration), mean Lifetime, and mean Deathtime formal-activities. activity and as before let L i , c denote the active period (in weeks).Then severity of host i , is defined as the average of K i , c / L i , c overall cycles of the host i in the dataset. A high severity value indi-cates that whenever a host is active (reported in the blacklists) it isaccompanied by a large volume of reported mal-activities. Care has been taken to remove duplicate reports, i.e., same (time, IP, URL) tuple, from

Blacklist-07-17 . In any case, potential duplicates in the 2M reports from

Blacklist-07-17 dwarf in comparison to the 49M unique reports obtained from VT.

We report the results of magnitude analysis in Figure 9. Observethat 27.4% of ASes and 9.45% of countries have a severity valueequal to one indicating a unique malicious report per week. TheCDF in Figure 10b indicates that only a few hosts are participatingin a plethora of mal-activities with as little as 200 IP addressesreported to be involved in more than 10K malicious activities perweek. Figure 10c shows the CDF of the mean severity values foreach of the mal-activity classes. We observe that fraudulent services igure 9: Magnitude analysis of top 5 IPs, ASes, and Coun-tries.ASN Organization Mag. CC Mag. (a) Top AS, Countries (CC) magnitude offenders Mean Severity (Reports/Week) C D F IPASNCC (b) Hosts Mean Severity (Reports/Week) C D F TotalPUPFruad. Serv.MalwareExploitsPhishingSpammers (c) Mal-Activities are reported in the “low severity” range when compared to the restof the categories.Table (a) of Figure 9 lists IP addresses, ASes, and countries withhigh values of severity. US has the highest severity of 82,558 re-ports per week. Distant second are countries like China, Germany,France, and ukraine with severity values of 377, 212, 149, and 80,respectively. This is likely due to the fact that the majority of host-ing services and Internet users originate from the US. Interestinglywe observe AS7276 (UNIVERSITY-OF-HOUSTON) with 2206 mal-activities per week as the AS with the highest severity. They werereported to have participated in all categories of mal-activities ex-cept Spammers, with 59785 reports in the dataset.We can observe a large portion of the reported mal-activitiesoriginate from potentially misusing cloud provider services (e.g.,Amazon Cloud) as these providers are unlikely to be intentionallypropagating their own mal-activity. This inference is consistentwith observations made in previous work [56].

Lessons Learned.

Our analysis shows that malware has been thelargest component of reports (90.9%), see Table 2, but when con-sidering the severity of reports, Malware on average produces 30.8reports per week, phishing has the next largest severity, with 9.3reports per week, despite only consisting of 4.74% of our dataset. On average, malware is approximately 3 times as severe as phishing,despite there being 19 times more malware reports than there isPhishing reports. It would be advisable for enforcement agenciesto focus on the primary attack vector that is malware, as disablinga malware source would yield the largest reduction of reports perweek. Not to discount the impact of shutting down a phishing host,it too receives a third of the reports as the most severe mal-activity.

A number of studies have characterized and measured mal-activities,in addition to proposing detection and/or prevention techniques.Researchers have also proposed general approaches that rely eitheron fundamental characteristics of botnet traffic or by correlatingmeta-datasets. For example, several works detect botnets-basedmal-activities by investigating their traffic [34] or typical behav-ior [40, 49, 58, 62]. Others have investigated multiple datasets includ-ing web resources from suspicious domains [42], host and networkinformation [54, 61], honeypots [52] or DNS traffic [59]. Kuhreret al. [45] analyze the performance of blacklists, with a forwards-facing collection of data by archiving it for a duration of two years.In this paper, we revisit the blacklists utilized by them and withadditional sources, collect a backwards-facing dataset, which iscollected post-factum, covering 10 years prior. Our analysis of theresulting dataset diverges as we perform the retrospective charac-terization and measurement of (different classes of) mal-activities.Using regional Internet registry (RIR) dataset spanning over aperiod of 12 years, Dhamdhere et al., [38] define two metrics (at-tractiveness and repulsiveness) to describe the relationship amongASes. Compared to our work, Dhamdhere et al., do not focus onmal-activity reporting, instead focusing on the AS ecosystem as awhole. Antonakakis et al. [33] study the behavior of Mirai Botnetactivity with a dataset collected in 2016 by industrial partners, to ob-serve the resilience of Mirai botnet against reverse engineering andtakedowns. Unfortunately this dataset, owned by Symantec, is notavailable for further research, and only focuses on a specific type ofmal-activity, whilst our analysis covers six different classes. Leitaet al., [46] propose “HARMUR” a system that leverages historicalarchives of malicious URLs collected by Symantec to detect mal-activities. In conjunction with publicly available blacklists, DNSreports of malicious domains, and Symantec’s proprietary malwarescanning service to resolve false-positives, for the collection ofmalicious URLs. Their proposal retains a large-scale analysis ofthe collected dataset as future work, however there has been nomention of this dataset to date. It should be noted that HARMURleverages the historical information for the purpose of classifyingnewly observed URLs, and thus is considered forwards-facing.By analyzing logs generated by dynamic analysis of malwaresamples spanning over a period of 5 years, Lever et al. [47] inves-tigate the evolution and behavior of the malware and PUP cate-gories of mal-activities. In contrast, our study retrieves static datasources that span over 10 years and consists of broader categoriesof mal-activities in six classes, with an analysis of infrastructure,geo-location and behavior over time.

Researchers and the industry alike find themselves in a continualarms race to fight against major instances of malicious activity on he internet. Although longitudinal datasets like ours do exist, theyare mostly proprietary since industries are unable to share it dueto reasons of privacy and to maintain a competitive advantage. Inthis paper, we addressed this gap, with a novel methodology thatcombined imperfect historical records with machine learning toproduce a decade long mal-activity dataset. To assist the researchcommunity, we have released our dataset into the public domainfor further research: https://internetmaliciousactivity.github.io/ With our unique dataset, we reflected on the behavior of mal-activity reporting over the last decade in order to gain insights intothe continuing presence of malicious activity. Our analysis, charac-terized host behavior among other aspects, recurrent periods, andseverity of mal-activity reporting in a P2P inspired churn model.Our analysis suggests that tracking the heavy mal-activity contrib-utors should be an absolute priority for law-enforcement agenciesand major network providers and cloud operators. We found a con-sistent minority of heavy offenders (i.e., IPs, ASes, and countries)that contribute a majority of mal-activity reports, posing a severethreat to the status-quo of our online ecosystem. We observed anumber of hosts with a short renewal cycle of “(in)activity”. Theirpresence on a blacklist has not deterred their activities. Had the hostbeen effectively blocked by blacklists, the renewal of their activityindicates the removal of the host from said blacklists suggesting aneed to consider longer durations prior to delisting a malicious host.Detecting and quickly reacting to the emergence of such heavymal-activity contributors would arguably significantly reduce thedamage inflicted by them.

REFERENCES

USENIX Security (2017).[34] AsSadhan, B., Moura, J. M. F., Lapsley, D., Jones, C., and Strayer, W. T.Detecting botnets using command and control traffic. In

NCA (2009).[35] Bilge, L., Kirda, E., Kruegel, C., and Balduzzi, M. Exposure: Finding maliciousdomains using passive dns analysis. In

NDSS (2011).[36] Cui, Q., Jourdan, G.-V., Bochmann, G. V., Couturier, R., and Onut, I.-V.Tracking phishing attacks over time. In

WWW (2017).[37] De Cristofaro, E., Friedman, A., Jourjon, G., Kaafar, M. A., and Shafiq, M. Z.Paying for likes?: Understanding facebook like fraud using honeypots. In

ACMIMC (2014).[38] Dhamdhere, A., and Dovrolis, C. Twelve Years in the Evolution of the InternetEcosystem.

IEEE/ACM ToN (2011).[39] Farooqi, S., Ikram, M., De Cristofaro, E., Friedman, A., Jourjon, G., Kaafar,M. A., Shafiq, Z., and Zaffar, F. Characterizing key stakeholders in an onlineblack-hat marketplace. In

Anti-Phishing Working Group (APWG) Symposium onElectronic Crime (eCrime) (2017).[40] Gu, G., Perdisci, R., Zhang, J., and Lee, W. Botminer: Clustering analysis ofnetwork traffic for protocol- and structure-independent botnet detection. In

CSS

Web Conference (2019).[43] Ikram, M., Onwuzurike, L., Farooqi, S., Cristofaro, E. D., Friedman, A.,Jourjon, G., Kaafar, M. A., and Shafiq, M. Z. Measuring, characterizing, anddetecting facebook like farms.

ACM Trans. Priv. Secur. 20

RAID (2014).[46] Leita, C., and Cova, M. Harmur: Storing and analyzing historic data on maliciousdomains. In

BADGER (2011), pp. 46–53.[47] Lever, C., Kotzias, P., Balzarotti, D., Caballero, J., and Antonakakis, M.A lustrum of malware network communication: Evolution and insights. In

S&P (2017).[48] Li, Z., Zhang, K., Xie, Y., Yu, F., and Wang, X. Knowing your enemy: Under-standing and detecting malicious web advertising. In

Proceedings of the 2012ACM Conference on Computer and Communications Security (New York, NY, USA,2012), CCS ’12, ACM, pp. 674–686.[49] Lu, W., Tavallaee, M., and Ghorbani, A. A. Automatic discovery of botnetcommunities on large-scale communication networks. In

ASIACCS

Journal of Machine Learning Research12 (2011), 2825–2830.[52] Pham, V.-H., and Dacier, M. Honeypot trace forensics: The observation view-point matters.

Future Generation Computer Systems 27 , 5 (2011), 539–546.[53] Poese, I., Uhlig, S., Kaafar, M. A., Donnet, B., and Gueye, B. Ip geolocationdatabases: Unreliable?

SIGCOMM Comput. Commun. Rev. 41 , 2 (Apr. 2011), 53–56.[54] Shin, S., Xu, Z., and Gu, G. Effort: Efficient and effective bot malware detection.In

INFOCOM (2012).[55] Stone-Gross, B., Holz, T., Stringhini, G., and Vigna, G. The undergroundeconomy of spam: A botmaster’s perspective of coordinating large-scale spamcampaigns.

LEET 11 (2011), 4–4.[56] Stone-Gross, B., Kruegel, C., Almeroth, K., Moser, A., and Kirda, E. Fire:Finding rogue networks. In

Annual Computer Security Applications Conference(ACSAC) (2009), IEEE, pp. 231–240.[57] Wang, G., Wilson, C., Zhao, X., Zhu, Y., Mohanlal, M., Zheng, H., and Zhao,B. Y. Serf and turf: crowdturfing for fun and profit. In

WWW (2012).[58] Wurzinger, P., Bilge, L., Holz, T., Goebel, J., Kruegel, C., and Kirda, E.Automatically generating models for botnet detection. In

ESORICS (2009).[59] Yadav, S., Reddy, A. K. K., Reddy, A. L. N., and Ranjan, S. Detecting algorithmi-cally generated domain-flux attacks with dns traffic analysis.

IEEE/ACM Trans.Netw. (2012).[60] Yao, Z., Leonard, D., Wang, X., and Loguinov, D. Modeling heterogeneoususer churn and local resilience of unstructured p2p networks. In

ICNP (2006).[61] Zeng, Y., Hu, X., and Shin, K. G. Detection of botnets using combined host- andnetwork-level information. In

DSN (2010). able 5: Basic Statistics of the initial seed Blacklist-07-17 dataset (cf. Section 2.1).

Total 2,272,362 297,095 2007-17 [62] Zhao, D., Traore, I., Sayed, B., Lu, W., Saad, S., Ghorbani, A., and Garant, D.Botnet detection based on traffic behavior analysis and flow intervals.

Comput.Secur. (2013).

A BLACKLIST-07-17

This section contains supplementary material about

Blacklist-07-17 (cf. Section 2.1).

A.1 Blacklist Summary

Our seed blacklists are summarized in Table 5. The majority of theseed lists (15 out 22) provided sufficiently rich information includ-ing timestamps of the reported mal-activities, URLs and domainsconsidered to be malicious, the corresponding IP address and afree form labels of the mal-activity being reported (e.g., “PayPalPhishing” and “Cryptowall Ransomware C&C”) describing the typeof mal-activity (phishing, exploit, botnet, etc.). Many of these listsincluded additional information about the autonomous system num-ber (ASN) and partial geolocation information (e.g., MDL [15] andMalc0de [14]). The remaining 7 sources (e.g., 360Mirai [1]) reportedtimestamps and IP addresses only.

B MAL-ACTIVITY CLASSIFICATION

In Section 2.4, we introduced six types of mal-activities, here weprovide their extended definitions.

Exploits.

Exploits take advantage of vulnerabilities in software, aseither private or public knowledge, to (remotely) execute code onthe victim’s system. Exploit kits are usually used as a first stage“dropper” to facilitate the installation of the final payload (i.e., mal-ware).

Malware.

This includes domains and IP addresses that have beenreported to distribute malicious payloads such as Trojans, viruses,worms, and ransomware.

Fraudulent Services (FS).

Domains and IP addresses engaged in thedistribution or provisioning of bogus or fraudulent services orapplications such as the promotion of comments, likes, ratings,votes or any variations thereof [37, 39, 43].

Spammers.

This class contains domains and IP addresses that arereported to host spam-bots to perform astroturfing/grass rootsmarketing [57] or to send large-scale, unsolicited emails or instantmessages.

Phishing.

This is composed of domains/IPs reported to host contentaimed at obtaining sensitive information by disguising as trustwor-thy online services.

Potentially Unwanted Programs (PUP).