[PDF] Exploratory Data Analysis of a Network Telescope Traffic and Prediction of Port Probing Rates

Abstract

Understanding the properties exhibited by large scale network probing traffic would improve cyber threat intelligence. In addition, the prediction of probing rates is a key feature for security practitioners in their endeavors for making better operational decisions and for enhancing their defense strategy skills. In this work, we study different aspects of the traffic captured by a /20 network telescope. First, we perform an exploratory data analysis of the collected probing activities. The investigation includes probing rates at the port level, services interesting top network probers and the distribution of probing rates by geolocation. Second, we extract the network probers exploration patterns. We model these behaviors using transition graphs decorated with probabilities of switching from a port to another. Finally, we assess the capacity of Non-stationary Autoregressive and Vector Autoregressive models in predicting port probing rates as a first step towards using more robust models for better forecasting performance.

Full PDF

EExploratory Data Analysis of a Network TelescopeTrafﬁc and Prediction of Port Probing Rates

Mehdi Zakroum ∗ , Abdellah Houmz ∗‡ , Mounir Ghogho ∗ , Ghita Mezzour ∗ ,Abdelkader Lahmadi § , J´erˆome Franc¸ois § , and Mohammed El Koutbi ‡∗ International University of Rabat, TICLab, Morocco ‡ Mohammed V University of Rabat, ENSIAS, Morocco § Universit´e de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France { mehdi.zakroum,abdellah.houmz,mounir.ghogho,ghita.mezzour } @uir.ac.ma { abdelkader.lahmadi,jerome.francois } @[email protected] Abstract —Understanding the properties exhibited by largescale network probing trafﬁc would improve cyber threat in-telligence. In addition, the prediction of probing rates is a keyfeature for security practitioners in their endeavors for makingbetter operational decisions and for enhancing their defensestrategy skills. In this work, we study different aspects of thetrafﬁc captured by a /20 network telescope. First, we performan exploratory data analysis of the collected probing activities.The investigation includes probing rates at the port level, servicesinteresting top network probers and the distribution of probingrates by geolocation. Second, we extract the network probersexploration patterns. We model these behaviors using transitiongraphs decorated with probabilities of switching from a portto another. Finally, we assess the capacity of Non-stationaryAutoregressive and Vector Autoregressive models in predictingport probing rates as a ﬁrst step towards using more robustmodels for better forecasting performance.

Index Terms —Cyber Intelligence, Cyber Security, NetworkTelescope, Darknet, Probing Patterns, Transition Graphs, Pre-diction of Probing Rates, Non-stationary Autoregressive Model,Non-stationary Vector Autoregressive Model, Machine Learning

I. I

NTRODUCTION

New cyber threat vectors and vulnerabilities are constantlyemerging with the evolution of technology. Attackers com-monly scan networks to ﬁnd vulnerable devices which can beused for malicious intents. One of the major attacks happenedin 2016 is the Dyn DDoS attack. The attackers used botnets ofvulnerable devices as a primary source of their DDoS trafﬁcgeneration, making leading internet platforms unavailable fora large number of users.Improving our knowledge on scan activities will help toprevent cyber attacks through early detection, and in general,to enhance security policies. Many causes can trigger scancampaigns such as vulnerability disclosure, worm spread andzero days. Generally, such malicious trafﬁc is hidden by alarge amount of legitimate trafﬁc, making it complex to be identiﬁed by internet service providers and network securityoperators to protect target users.A passive approach for identifying network probing activi-ties are network telescopes, also known as darknets. A networktelescope is a sensor logging the trafﬁc received by a set ofpassive unallocated network addresses. Therefore, the trafﬁcreceived by the network telescope is considered suspicious,requiring thus to be examined.To collect such trafﬁc, we use a network telescope hostedat INRIA Nancy-Grand Est consisting of nearly 4096 IPv4addresses. By analyzing the collected data, we aim to answerthe following questions: • What are the most targeted services? What are the ser-vices targeted by the top network probers? • How network probers are exploring the target network?How to model these probing activities? • Can we predict probing rates of the targeted services?The remainder of this paper is structured as follows. Thenext section provides a review of the related work to thisstudy. Section III presents an exploratory analysis of thedarknet trafﬁc. In section IV, we identify the attackers probingpatterns. Finally, in section V, we explore the capacity of Non-stationary Autoregressive and Vector Autoregressive models toforecast the probing rates at the port level.II. R

ELATED W ORK

Reconnaissance is the ﬁrst phase in the cyber kill chain,where the attacker scans the target infrastructure looking forvulnerabilities. A more generic approach for ﬁnding vulnerabledevices consists of scanning the whole IPv4 address space,including network telescopes. Many studies leverage the trafﬁccaptured by the latter to study different aspects of probingactivities. Durumeric et al. [1] studied the trafﬁc acquired bya large network telescope consisting of 5.5M IP addresses.The study includes the origin of scans, the targeted servicesby network probers and the effect of vulnerability publicationon probing activities. Bou-harb et al. [2] used a probabilisticand statistical approach to identify the origin of the probingactivities: whether they are generated by scanning tools orby worms and botnets. They also studied whether probing (cid:13) a r X i v : . [ c s . CR ] A p r ctivities are random or they exhibit speciﬁc patterns. Eto etal. [3] proposed a method to extract the features of scanningmalwares based on the oscillation of destination IP addressesin the captured scan packets. Li et al. [4] proposed a generalframework that identiﬁes scanning events and analyzing meth-ods used by botnets in probing compaigns. They applied theirframework to extract the scanning characteristics of a set of 6botnets. Papale et al. [5] analyzed a 12-day world-wide cyberscanning campaign targeting VoIP (SIP) servers caught by a /8darknet. They found that the origin is the Sality botnet whichgenerated about 20 million packet from roughly 3 million IPaddresses.Few studies explored the dependencies between targetedports. McNutt and Markus [6] presented a method for detect-ing the start of anomalous port-speciﬁc activity by recognizingdeviation from correlated activities. They found a high cor-relation between time series of ﬂow counts on unassigned orobsolete ports that do not have active services. Therefore, theycan detect ports receiving anomalous activities. In contrast toour work, they used in their study a trafﬁc of an organizationnetwork (not a darknet trafﬁc), where the amount of benigntrafﬁc is large, hiding thus malicious trafﬁc. Lagraa andFranc¸ois [7] inferred the dependency between services usinggraph analysis. They proposed a graph-based model to dis-cover port scanning behavior patterns. They applied methodsutilized for community structure discovery in large graphsin order to identify clusters of ports. Our work generalizestheir approach: instead of constructing graphs for each pair ofsource and destination IP addresses, our graphs aggregate theprobing activity by source IP address. This approach showsthe general exploration pattern followed by a network prober,regardless of the target host.III. E XPLORATORY D ATA A NALYSIS

A. Data Set

The data we use is collected by a /20 network telescope.The trafﬁc was recorded from November 2014 until October2017 and has a size of 2 TB. The collected trafﬁc consists oftimestamped packet headers. We record for each packet thesource and the destination IP addresses, the source and thedestination ports and the packet’s ﬂags. Our study focuses onstateful connections established from the source. Hence, weconsider only packets with a TCP SYN ﬂag which count forapproximately 4.5 billion packets.

B. Trafﬁc by Port

We begin by extracting the trafﬁc received by each port.That is, we aggregate the received trafﬁc by destination portand we count the number of TCP SYN packets for eachdestination port. Figure 1 shows the 30 most targeted ports.We observe that the most targeted services are remote accessservices, web servers, database management systems and someMicrosoft services. The network probers tend to use alternativeports in addition to the ofﬁcial ones. The port 23 (telnet)generates more than 50% of the trafﬁc. Figure 2 shows that

23 22 2323 445 3389 80 7547 1433 5358 8080 3306 443 23231 5632 81 21 6789 2222 25 5900 9200 6379 27017 9000 8888 4028 1723 110 10000 8545

Port Number T r a ff i c P e r c e n t a g e .

34 5 .

66 2 .

81 2 .

35 1 .

94 1 .

89 1 .

85 1 .

83 1 .

57 1 .

23 0 .

91 0 .

75 0 .

65 0 .

57 0 .

48 0 .

42 0 .

35 0 .

31 0 .

28 0 .

25 0 . .

19 0 .

18 0 .

17 0 .

16 0 . Traffic Percentage by Port

Fig. 1: The 30 most targeted ports and their correspondingtrafﬁc percentage (the length of the bars are in log scale andthe labels above the bars are the actual values) T r a ff i c C u m u l a t i v e P e r c e n t a g e Traffic Cumulative Percentage By Number of Ports

Fig. 2: Trafﬁc cumulative percentage by number of portsamong 65535 ports, only 35 generate 80% of the trafﬁc and550 generate 90% of the trafﬁc.

C. Top Network Probers’ Interests

The intent of a network prober might be manifested inthe services he targets. Knowing the ports that interest thetop network probers determines services requiring particularsecurity efforts. We consider as a top network prober onemaintaining an average probing rate higher than 150 TCPSYN packets per day. It is noteworthy that our deﬁnitionof top network probers does not include probers performingdistributed probing activities.First, we count TCP SYN packets sent by more than 64million source IP addresses included in our data set. Then, foreach top network prober (they are nearly 1500), we extractthe probing rates received by each port and we aggregate thecounts by port. Figure 3 shows the 30 most targeted ports bytop network probers. In contrast to the results in Figure 1, topnetwork probers focuse their probing activities on the port 22(SSH) rather than the port 23 (telnet).

D. Trafﬁc by Country

The distribution of the received trafﬁc by geolocation helpsdetermining how likely an occurred probing campaign is

Port Number T r a ff i c P e r c e n t a g e .

16 4 .

18 3 .

56 2 .

73 2 .

55 2 .

27 2 .

16 1 .

31 0 .

87 0 . .

76 0 .

72 0 .

69 0 .

63 0 .

58 0 .

57 0 .

56 0 .

54 0 .

44 0 .

41 0 . .

39 0 .

38 0 .

37 0 . Top Network Probers' Traffic Percentage by Port

Fig. 3: The 30 most targeted ports by top network probers(the numbers above the bars are the percentages of the trafﬁcreceived by ports with respect to the total trafﬁc generated bythe top network probers) C N U S T W B R V N R U K R I N T R N L F R D E U A G B M X A R R O C O I R I T P L H K T H C A E S P H J P A U I L S E Country Code T r a ff i c P e r c e n t a g e .

74 11 . . .

55 4 .

76 4 .

76 3 .

95 3 .

33 2 .

94 2 .

85 2 .

13 2 .

13 1 .

92 1 .

71 1 .

64 1 .

56 1 .

26 1 .

06 0 .

91 0 . .

89 0 .

87 0 .

81 0 .

69 0 .

68 0 .

67 0 .

61 0 .

59 0 . Traffic Percentage by Country

Fig. 4: Top 30 countries and their corresponding trafﬁc per-centagesoriginating from a speciﬁc country. We extract the total trafﬁccaught by the network telescope by country. We infer thecountry code from the source IP address using the DB-IPdatabase . Figure 4 summarizes the received trafﬁc by countrycode. IV. P ROBING P ATTERNS

In this section, we model the behavior of network probersusing transition graphs. We assess the relationship betweentargeted services by determining the probability of transitionfrom a port to another, then, we identify different networkexploration patterns.

A. Graph Modeling

A network communication can be identiﬁed by a 5-tuple:source and destination IP addresses, source and destinationports, and the transport protocol. Our aim is to analyze foreach network prober the exploration behavior of the wholedarknet. Hence, we take into account only two features: thesource IP address and the destination port. URL: https://db-ip.com/

To extract the graphs, we begin by aggregating the trafﬁcgenerated by each network prober. Then, we count the numberof transitions from a destination port to another by sequentiallybrowsing the extracted trafﬁc. These counts are ﬁnally normal-ized in order to get the transition probabilities. It is to notethat the time dimension is omitted during this process.Formally, we extract for each source IP address i a transitiongraph G i ( V i , E i ) , in which V i is the set of destination portstargeted by the network prober i and E i represents the transi-tion probabilities between destination ports that are elementsof E i . The association between two elements p a and p b of E i represents the probability that the network prober i switchesfrom p a to p b . B. Extracted Graphs

Figure 5 shows a sample of transition graphs correspondingto 3 network probers. Figure 5a represents a network prober se-quentially targeting services typically deployed in web servers:SSH (22), RDP (3389), MySQL (3306) and FTP (21), whilefocusing on the HTTP (80) server. Figure 5b shows a networkprober targeting only the MySQL server port in addition totwo of its alternatives. Figure 5c corresponds to a networkprober targeting remote access services such as SSH (22) andits alternative (2222), and telnet (23) and its alternative (2323).Many other probing patterns were identiﬁed but they are toolarge to ﬁt in this paper.The extracted transition graphs differ from each other bytwo main components: the number of vertices that correspondsto the destination ports and the number of edges describing theexploration behavior of a network prober. Figure 6 representsthe cumulative distribution function of the number of targetports by individual network probers. We observe that morethan 80% of network probers target less than ﬁve ports in thewhole darknet space. This means that most attackers are focus-ing their probing activities only on services of interest, whichmight be related to a vulnerability disclosure for example.

C. Relationship Between Ports

In this section, we aim to identify the relationship betweenports in terms of transition probabilities. We begin by aggre-gating by network prober the number of transitions from a portto another. Then, normalize these counts by the total numberof transitions. We repeat this process for all network proberscombined together and for top network probers (as deﬁned insection III-C).Figure 7 represents the transition matrix of the 30 mosttargeted ports in the whole darknet (see section III-B). Unsur-prisingly, the ﬁgure shows a high association between portsand their alternatives: 23 and 2323, 80 and 8080, and 22 and2222. The ﬁgure also emphasizes a strong relationship betweenservices of the same type such as MS-SQL SERVER (1433)and MySQL (3306).Similarly, Figure 8 shows the transition probabilities of the30 most targeted ports by top network probers. We observefewer relationships compared to the previous transition matrix.Nevertheless, the SSH and telnet services as well as theiralternatives still strongly related.

22 3389 (a) Sequential probing pattern (web services) (b) Targeted probing pattern (MySQL service) (c) Targeted probing pattern (remote access services) Fig. 5: A sample of transition graphs of 3 network probers.The size of the vertices corresponds to the number of TCPSYN packets received by the targeted port. Number of Ports by Network prober30405060708090100 P e r c e n t a g e o f N e t w o r k P r o b e r s Cumulative Distribution of the Number of Targeted Ports by Network Probers

Fig. 6: Cumulative distribution of the number of ports targetedby network probersV. P

REDICTION OF P ORT P ROBING R ATES

Predicting probing activities at the service level is a key fea-ture for making better security operational decisions. Observ-ing a signiﬁcant disparity between the predicted probing rateand the actual value may help detecting an imminent threat.In this section, we forecast the probing rate of a target portby measuring its previous probing rates as well as the trafﬁcreceived by the other ports. The predictions are performedone step ahead of time using the non-stationary autoregres-sive model (AR) and the non-stationary vector autoregressive

21 22 23 25 80 81 110 443 445 1433 1723 2222 2323 3306 3389 4028 5358 5632 5900 6379 6789 7547 8080 8545 8888 9000 9200 10000 23231 27017

Transition to Port212223258081110443445143317232222232333063389402853585632590063796789754780808545888890009200100002323127017 T r a n s i t i o n f r o m P o r t Transition Matrix of All Network Probers

Fig. 7: Transition matrix of all network probers

21 22 23 25 53 80 443 445 587 1433 1604 2222 2323 3306 3389 5632 5800 5900 5901 6379 8000 8009 8080 8081 8090 8545 8888 9200 11211 27017

Transition to Port21222325538044344558714331604222223233306338956325800590059016379800080098080808180908545888892001121127017 T r a n s i t i o n f r o m P o r t Transition Matrix of Top Network Probers

Fig. 8: Transition matrix of top network probersmodel (VAR), for each port of the 550 most targeted ports(see Section III-B), for different time resolutions.

A. Data Set

The probing rate is inferred for different time resolutions:1 hour, 3 hours, 6 hours, 12 hours and 1 day. For each timeresolution, we extract a data set consisting of the probing ratetime series of the most targeted ports D = { X ( i ) } i ∈ [1 , inwhich each record is a vector of probing rates occurring in thesame time interval. B. Non-stationarity of Probing Rate Time Series and DesignParameters

The analysis of the port probing rate time series showednon-stationarity of the ﬁrst and the second order statisticsover the period of 3 years. However, we observed that whenconsidering shorter time windows, the non-stationarity tendsto be alleviated at least in terms of average. Therefore, weintroduce a “rolling window” over the probing rate time series.We train the estimators using the data falling in the rollingwindow, then the prediction is performed one step ahead oftime. The size of the rolling window is a design parameter thatcould be interpreted as follows: a short rolling window allowsto track the trend in time while larger rolling windows hasthe effect of averaging the trend. Another design parameter toconsider is the autoregressive order which allows the modelto infer the linear short term dependencies. . Non-stationary AR and VAR Models

To forecast probing rates, we use the non-stationary autore-gressive model of order p given by: x ( i ) ( t ; p ) = w ( i )0 ( t ) + p (cid:88) h =1 w ( i ) h ( t ) x ( i ) ( t − h ) + (cid:15) ( i ) ( t ) (1)where x ( i ) ( t ) is the probing rate received by the i th portat time t . (cid:15) ( i ) ( t ) is the white noise at time t . W ( i ) t =( w ( i )0 ( t ) , w ( i )1 ( t ) , . . . , w ( i ) p ( t )) is the vector of the model pa-rameters. These parameters are estimated using data falling inthe rolling window (see Section V-B) and they vary in time.The non-stationary vector autoregressive model of order p is given by: x ( i ) ( t ; p ) = w ( i )0 ( t )+ (cid:88) j ∈I k p (cid:88) h =1 w ( ij ) h ( t ) x ( j ) ( t − h )+ (cid:15) ( i ) ( t ) (2)where x ( i ) ( t ) is the probing rate received by the i th port (thetarget port) at time t , I k is the set of indexes of the k retainedfeatures (see Section V-F), x ( j ) ( t ) is the probing rate on the j th selected port at time t , and (cid:15) ( i ) ( t ) is the white noise attime t . w ( i )0 ( t ) and { w ( ij ) h ( t ) } j ∈I k ,h ∈ [1 ,p ] are the parametersof the model varying in time. D. Training Algorithm

After transforming the time series into a supervised learningproblem, the algorithm used to train the non-stationary AR andVAR regressors is the straightforward normal equation givenby: W ( i ) = ( X T X ) − X T X ( i ) (3)where W ( i ) is the vector of the trainable weights, X =( X ( j ) ) j ∈I k is the probing rates feature matrix and X ( i ) isthe probing rates response vector. In order to reduce thecomputation complexity, no regularization is used. E. Design Parameter Selection

The design parameters, namely the size of the rollingwindow N and the autoregressive order p , are determinedusing a grid search strategy. We varied p in [1 , for the5 considered time resolutions. Then, we tried an exhaustiveset of rolling window sizes for each autoregressive order p .The range of N starts with × p time units and ends with75% of the time series length (leaving thus at least 25% ofdata for validation) with an increment of 10 time units. Theoptimal design parameter values p (cid:63) and N (cid:63) are given by theestimator providing the best coefﬁcient of determination R . F. Feature Selection for the Non-stationary VAR Model

To improve the performance of the non-stationary VARestimators, we select features according to their individualeffect on the response variable using the Pearson correlationcoefﬁcient. This process has the effect of reducing the noiseintroduced by uncorrelated features. First, we split the datafalling in the rolling window (the one giving the best non-stationary AR performance) into two subsets: a feature selec-tion set F including 75% of the data and a validation set V . Second, we compute on F the univariate correlations in termof probing rates between the target time series and the timeseries serving as features to the non-stationary VAR model,including the autoregressive features. Third, we iterativelyselect the k most correlated features which we use to train non-stationary VAR model on F and we calculate the coefﬁcientof determination R on the validation set V . The optimal setof features given by our feature selection strategy is the oneproviding the best coefﬁcient of determination. It is worthmentioning that the selected features may change over timebased on the location of the rolling window in the time series.Finally, the selected features are scaled to zero mean and tounit standard deviation. G. Results and Discussion

Table I summarizes the performance of the non-stationaryAR and VAR models for 5 different time resolutions for aset of popular services. It also includes the optimal designparameters for the non-stationary AR estimators. We used thesame design parameters for the non-stationary VAR estimators.The performance of the regressors tends to increase forlarger time resolutions, for all the ports except the telnetservice. This is due to lowered stochasticity of probing rateswhen considering larger time resolutions. Also, we observethat the probing rates of remote access services are the mostpredictable. The reason is that such services are highly targetedby network probers and their probing rate time series arestationary when considering short time resolutions.Also, we observe that the non-stationary AR model pro-duces satisfying results for services exhibiting low short termprobing rates variability such as telnet (ports 23 and 2323).Figure 9 shows that non-stationary VAR model consistentlyproduces better results for services exhibiting high probingrate variability such as the web services (ports 80 and 443)and the database management systems (ports 1433 and 3306).It is noteworthy that the non-stationary autoregressive modelfails in predicting abrupt probing rate changes because ofits persistence property. More powerful and stable modelssuch as FARIMA+GARCH could be used to predict theseextreme values if the probing rate time series exhibit long-range dependence phenomenon [8], [9]. Also, such non-stationary AR and VAR models, as deﬁned in our paper,require constant update of their parameters (the trainableweights) and their hyperparameters (the selected features) dueto the non-stationarity of the probing rate time series, whichis computationally expensive.VI. C

ONCLUSION

This work presented an exploratory data analysis performedon 2 TB of trafﬁc collected by a network telescope during theperiod of 3 years. The investigation of the network telescopetrafﬁc showed that 90% of probing activities are targetingonly 550 ports of the port space. The latter include remoteaccess services which are the most sought by network probers,followed by database management systems, web services and etwork 1 hour 3 hours 6 hours 12 hours 1 dayService p (cid:63) N (cid:63) R ar R var p (cid:63) N (cid:63) R ar R var p (cid:63) N (cid:63) R ar R var p (cid:63) N (cid:63) R ar R var p (cid:63) N (cid:63) R ar R var

23 (telnet) 1 17530 0.99 0.99 8 5930 0.98 0.98 4 2880 0.96 0.96 6 1480 0.94 0.94 1 60 0.93 0.932323 (telnet alt.) 4 17400 0.99 0.99 2 5760 0.98 0.98 4 2880 0.97 0.97 1 120 0.94 0.94 1 720 0.92 0.9222 (ssh) 10 19090 0.66 0.71 10 6380 0.81 0.81 9 3190 0.88 0.88 10 1590 0.91 0.91 9 800 0.92 0.922222 (ssh alt.) 10 19430 0.56 0.68 10 6390 0.73 0.74 8 3190 0.81 0.81 9 1590 0.86 0.86 6 800 0.88 0.88445 (microsoft-ds) 10 16080 0.96 0.96 1 160 0.97 0.97 10 2960 0.96 0.96 10 1590 0.96 0.96 8 740 0.96 0.9680 (http) 10 12890 0.10 0.55 8 550 0.19 0.44 8 2880 0.28 0.61 7 1450 0.34 0.53 7 800 0.44 0.64443 (https) 6 18870 0.23 0.63 8 6430 0.22 0.69 8 3230 0.31 0.53 10 1610 0.39 0.60 9 800 0.50 0.703306 (mysql) 1 700 0.03 0.65 1 250 0.08 0.65 8 2860 0.16 0.40 10 1440 0.29 0.67 8 720 0.40 0.731433 (mssql) 1 120 0.39 0.62 1 60 0.61 0.68 9 2940 0.72 0.76 10 1460 0.81 0.81 5 730 0.88 0.881883 (mqtt) 1 360 0.03 0.58 9 6000 0.78 0.79 9 3000 0.82 0.82 10 1510 0.84 0.84 7 730 0.82 0.82

TABLE I: Performances of non-stationary AR and VAR estimators for different time resolutions for a set of popular services. p (cid:63) and w (cid:63) are the optimal design parameters for the non-stationary AR estimators.

23 2323 445 22 1883 2222 1433 443 80 3306Port0.00.10.20.30.40.50.60.70.80.91.0 R Performances of the Non-stationary AR and VAR Models for Different Time Resolutions

AR, 1 hour time resolutionAR, 3 hours time resolutionAR, 6 hours time resolutionAR, 12 hours time resolutionAR, 1 day time resolutionVAR gain in performance

Fig. 9: Comparison of performances of the non-stationary AR and VAR modelsmiscellaneous services as well. This is providing an insightabout services requiring particular security efforts.The second task was about inferring network probers recon-naissance patterns. We modeled these exploration behaviorsusing transition graphs showing the probabilities of switchingfrom a port to another. This would be exploited in differentapplications such as port clustering or classifying networkprobers based on their exploration behaviors.Finally, we assessed to which extent the non-stationaryautoregressive and vector autoregressive models could producereliable short term probing rate predictions at the port level.Due to their short memory property, such models couldbe used to learn non-stationary probing rate processes withshort term persistence. However, when probing rate processesexhibit long-range dependencies, more robust models could beutilized such as GRU and LSTM recurrent neural networks.A

CKNOWLEDGMENT

This research work is part of the ThreatPredict projectpartly funded by the NATO Science for Peace and Secu-rity (SPS) programme under research contract SPS G5319“ThreatPredict: From Global Social and Technical Big Datato Cyber Threat Forecast”. We acknowledge the support fromthe National Center of Scientiﬁc and Technical Research(CNRST), Rabat, for the grant of an excellence scholarship.The authors would like to thank Fr´ed´eric Beck from the High https://threatpredict.loria.fr Security Laboratory at INRIA Nancy-Grand Est, LORIA, forhis efforts in managing data and computation servers.R

EFERENCES[1] Z. Durumeric, M. Bailey, and J. A. Halderman, “An Internet-Wide Viewof Internet-Wide Scanning.” pp. 65–78, 2014.[2] E. Bou-Harb, M. Debbabi, and C. Assi, “A Statistical Approach forFingerprinting Probing Activities,” in . IEEE, Sep. 2013, pp. 21–30.[3] M. Eto, K. Sonoda, D. Inoue, K. Yoshioka, and K. Nakao, “A proposalof malware distinction method based on scan patterns using spectrumanalysis,” in

Neural Information Processing , C. S. Leung, M. Lee, andJ. H. Chan, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009,pp. 565–572.[4] Z. Li, A. Goyal, and Y. Chen, “Honeynet-based botnet scan trafﬁcanalysis,” in

Botnet Detection . Springer, 2008, pp. 25–44.[5] A. Dainotti, A. King, k. Claffy, F. Papale, and A. Pescap`e, “Analysis ofa ”/0” Stealth Scan from a Botnet,” in

Proceedings of the 2012 InternetMeasurement Conference , ser. IMC ’12. Boston, Massachusetts, USA:ACM, 2012, pp. 1–14.[6] J. McNutt and D. S. Markus, “Correlations Between Quiescent Ports inNetwork Flows,” 2005, p. 5.[7] S. Lagraa and J. Franc¸ois, “Knowledge discovery of port scans fromdarknet,” in

Integrated Network and Service Management (IM), 2017IFIP/IEEE Symposium On . IEEE, 2017, pp. 935–940.[8] Z. Zhan, M. Xu, and S. Xu, “Characterizing honeypot-captured cyberattacks: Statistical framework and case study,”

IEEE Transactions onInformation Forensics and Security , vol. 8, no. 11, pp. 1775–1789, Nov2013.[9] ——, “Predicting cyber attack rates with extreme values,”