A Monte Carlo method for the spread of mobile malware
aa r X i v : . [ c s . S I] D ec A MONTE CARLO METHOD FOR THE SPREAD OF MOBILE MALWARE
ALBERTO BERRETTI AND SIMONE CICCARONEA bstract . A new model for the spread of mobile malware based on proximity ( i.e.
Bluetooth, ad-hoc WiFi or NFC) is introduced. The spread of malware is analyzed using a Monte Carlo methodand the results of the simulation are compared with those from mean field theory.
1. I ntroduction
Mobile computing platforms (from simple featurephones to smartphones to tablets) are be-coming ubiquitous and ever more capable, and they are slowly eroding the predominance of thepersonal computer, especially at the notebook level. As they are becoming more capable andmore common, they become the target of malware (viruses, worms, etc.). In fact, you can’tinfect a device which is not capable of executing arbitrary binary programs, which renders oldfashioned, dumb, cellular phones relatively safe, and moreover there is no incentive in lookingfor exploits and developing malware for platforms with a limited market; but the always increas-ing raw computing power of the modern smartphones (multi-core processors, gigabytes of RAM,and recently even 64 bit processors) make them as powerful as yesterday’s personal computers,and their di ff usion makes them a valuable target, one worth exploiting by organized crime.Mobile malware is not new. Years ago, Symbian was the most popular smartphone operatingsystem and several viruses have been developed to infect Symbian-based phones. It is believedthat Cabir[1], developed by an unknown hacking group around 2004, was the first such virus.It infected Symbian Series 60 phones with Bluetooth enabled, and it spread to nearby phonesbut required (usually, but not in all brands and models of phones) user intervention to acceptthe download of the executable over Bluetooth. Cabir didn’t do anything but spreading, even-tually rendering the UI of the phone useless for the continuous requests to accept the downloadof the executable or draining the battery. In 2005 a similar virus dubbed “CommWarrior”[2], al-ways targeting the Symbian Series 60 platform, spread using both Bluetooth (infecting arbitrarynearby phones) and MMS (following therefore the social graph of the user by choosing randomlycontacts from the user’s phonebook). Several more viruses have been developed since, even ifplatforms, infection mechanisms and purposes have changed to keep up with the latest trends inmobile computing. ALBERTO BERRETTI AND SIMONE CICCARONE
Malware which explicitly targets mobility can exploit two characteristics: follow the socialgraph of the user exploiting access to the user’s phonebook entries (“social malware”, sometimealso called “topological malware” to stress the influence on the topology of the social graph onits spread), or use protocols based on proximity like Bluetooth, ad-hoc WiFi connections or NFC(“proximity-based malware”). In this paper we focus on this last aspect of mobile malware,developing a “microscopic”, stochastic model for the propagation of the infection suitable forMonte Carlo simulations. 2. D efinition of the model
When modeling infections spread via proximity it is natural to consider percolation models,where susceptible objects occupy sites on a regular lattice – or are distributed with di ff erenttopologies: for example a graph – and the infection spreads from an infected site to neighboursites. This approach doesn’t take into account mobility, where neighbours change as each sus-ceptible object moves around.A simple and realistic way to take into account mobility is to make each object perform arandom walk in a regular lattice. As objects get to approach, an infected one can pass on theinfection to each of its – temporary – neighbour, with a given probability.To be definite, we assume that N susceptible objects perform a random walk on a squareportion of a two-dimensional lattice L = { , . . . , L − } × { , . . . , L − } . The density of objectsis therefore d = N / L . These objects are initially placed in some arbitrary way on the lattice, forexample are uniformly distributed. Time is discrete, and each object performs a random walkmoving with equal probability in one of the four possible directions into one of the four nearestneighbour sites.We have to deal somehow with the finite size of the box L in which the objects move. “Free”boundary conditions, in which each object is free to leave the box L , would deplete the box itselfwith probability one after a finite amount of time, so one would have to take into account theappearance of new objects which move into L : this would give rise to a model with variablenumber of objects, something like a “grand canonical ensemble” in statistical mechanics. Weprefer for the moment to avoid the complexity of dealing with disappearing and reappearing newobjects, which we consider inessential to the problem, so we use periodic boundary conditions: asone object moves out of L on one side, it reappears from the opposite side of the box; the randomwalk happens therefore on a torus. Another possibility would be to have objects bouncing whenthey reach the boundary: while this is relatively simple to take into account in the simulation,again it would add complexity to the model without really changing in any significant way thephenomenology of the model (as we tested). PREAD OF MOBILE VIRUSES 3
Each object can be in one of two states: healthy or infected. As objects move into the samesite the infection can spread from one of the infected object to each one of the object whichoccupy the same site with a given probability p . Of course, more than two objects can be on thesame site at a given time: in this case we consider separately all pairs as a possible source ofinfection ( i.e. if we have three objects on a site, two infected and one healthy, then we test for apossible infection of the healthy object twice independently, because of the two possible sourcesof infection).As all objects eventually intersect their trajectories, and eventually get the infection, all objectssooner or later would become infected. We therefore have to take into account also the chancethat a given object heals itself (perhaps because the infection has been detected and dealt with).So at each (discrete) instant of time each infected object has a probability q of healing.We therefore have three parameters which determine the spread of the infection on a givenbox L (besides the size of the box): the density of objects d , the probability of infection p andthe probability of healing q . We look at the case in which the box is large, i.e. what would becalled an “infinite volume limit” or “thermodinamical limit” in statistical mechanics. This isbasically a SIS model, using standard epidemiological terminology: recovered objects can getinfected again and don’t get to be immune, as in so-called SIR models. If, using our mobilitymodel, we were to use a SIR model, all objects eventually would be infected, recover and neverget infected again and the epidemics would stop with probability one in a finite amount of time,independently on the side of the box.This model is, mathematically, a Markov chain whit a huge state: if we have M objects inour box, the di ff erent possible configurations are 2 M . It is clearly possible to transition from anystate to any state, with the exception of the state in which all objects are healthy and so thereisn’t anymore a way to get infected: this is an absorbing state that eventually the Markov chainwill reach; for instance, if we have N infected objects, with probability q N (extremely small butnon-zero) they could get all healthy at the same time. We believe that the probability of reachingsuch a state in a given fixed amount of time, given some fixed values of p , q > d , isexponentially small in the volume of the box and so negligible in the “thermodinamical limit”that we are considering.The choice of a square lattice and a simple random walk over it as a mobility model is some-what arbitrary and motivated basically by mathematical simplicity. More complicated mobilitymodels could be (and have been) devised. But as we are interested in qualitative features, andnot in exact quantitative features of specific, realistic models, we concentrate our attention ona mathematically simple model which, while avoiding the complexities of a realistic one, keepsits qualitative features, much in the spirit of most statistical mechanical models commonly used ALBERTO BERRETTI AND SIMONE CICCARONE in mathematical physics. As a byproduct, our simulation code is simpler, faster and more e ffi -cient. We also emphasize that our model, being based on a discrete random walk in a lattice, is adiscrete time simulation. 3. R elated work A few papers have studied simulations of SIS models via random walks. In [3] the spatial dis-tribution of nodes in a random waypoint model is studied, and it is shown to be inhomogeneous.The authors only take into account the distribution of the agents, without ever considering prop-agation of infection. They use physical units of measures in the simulation, taking into accountan area of 1000 × ×
50 lattice (and so quite small).In [4] the transmission of messages in a network of mobile nodes is studied. The authors againuse physical units and so they consider an area of 1500 ×
300 meters where 50 agents move usinga random waypoint model, with a transmission radius which varies between 10 and 2500 meters(so again the e ff ective size of the region is quite small). Using a simulation the authors study themean time to deliver a message and the number of hops necessary to reach destination.In [6] a SIR model with di ff erent mobility models is considered, with a populations of up to1000 objects and a time of up to 1000 discrete units, and they look at the average number ofimmunized objects.In [7] are used again physical units: they consider an area of dimension 200 ×
200 m with aninteraction radius of 5 m, so the model can be compared to a lattice model with dimension 40 × ff erent (and quite simpler to simulate:in fact we could handle a simulation with a much larger number of agents and area) as it isbased on random walks on a lattice, and we found instead a more complex dependence of theepidemic threshold from infection and disinfection probabilities, even in the mean field theoryapproximation. 4. T he simulation The code for the simulation has been written in C to achieve optimum performance. We ran inon a small cluster using at most eight computational cores (four Intel Xeon dual core processorsat 3 GHz) and on a small personal workstation (with an Intel I3 processor at 3.1 GHz), all runningLinux.
PREAD OF MOBILE VIRUSES 5
We used square lattices of sizes from 16x16 up to 512x512. The results from lattices ofdi ff erent size have been compared and we observed that they do not change significantly forlattices of size higher than 64 x 64, while they are more volatile for lattices of smaller sizes: so akind of “thermodinamical limit” is practically achieved already for this size. So we settled for asize of 128x128.The population density was chosen between 0.1 and 1 in step of 0.1, and also a few run athigher densities have been performed (with densities equal to 2, 5 and 10). Note that densitieshigher than 1 imply that most sites are occupied by more than one agent, which is entirelypossible within our model.The built-in random number generator of the compiler has been used for the simulations.As expected, the limit fraction of infected agents f ∞ doesn’t depend on its initial value f , sowe took f = . τ . This is of course what must be done in any dynamic Monte Carlo simu-lation to insure that data points are taken from an equilibrium distribution and that they are takensu ffi ciently far apart so that they can be considered independent. In our case, moreover, the timeto reach the equilibrium is an interesting quantity per se . To compute autocorrelation times, weused a Python version of the acor package written by Jonathan Goodman [8].5. M ean F ield T heory We can approximate our model using a kind of “mean field theory” approach, which is ex-pected to have a qualitatively agreement with the complete, exact model. In this approximationwe consider a single object , whose evolution is based on the average behaviour of the rest of thesystem.In average, each objects undergoes ≈ d intersections at each time, with each intersectiongiving a chance to get infected if it happens with an infected object. As f = N / M is the fractionof the infected objects, at each time each object approximately intersects with an infected one“ f d times”.Therefore approximately the probability that an object gets infected is p ′ = − (1 − p ) f d –pretending that f d is an integer –, that is 1 minus the probability of never getting infected in eachof its f d intersections with an infected object.So considering each object individually, let X = H or I denote the status of an object ( H forhealthy and I for infected). The following diagram explains the transition to the new state witheach probability: ALBERTO BERRETTI AND SIMONE CICCARONE
H p ′ qI q qqqqqqqqqqqqqq − q & & ▼▼▼▼▼▼▼▼▼▼▼▼▼▼ I p ′ (1 − q ) X p ′ A A ✄✄✄✄✄✄✄✄✄✄✄✄✄✄✄✄✄✄ − p ′ (cid:29) (cid:29) ❀❀❀❀❀❀❀❀❀❀❀❀❀❀❀❀❀❀ H (1 − p ′ ) qX q qqqqqqqqqqqqq − q & & ▼▼▼▼▼▼▼▼▼▼▼▼▼ X (1 − p ′ )(1 − q ) The dynamics of a single object can therefore be approximated by a much simpler Markovchain, where the state space is just the set { H , I } (being healthy or being infected) and the transi-tion probabilities are given by:State at t State at t + H H p ′ q + (1 − p ′ ) q + (1 − p ′ )(1 − q ) = − p ′ + p ′ qH I p ′ (1 − q ) = p ′ − p ′ qI H p ′ q + (1 − p ′ ) q = qI I p ′ (1 − q ) + (1 − p ′ )(1 − q ) = − q The transition matrix is therefore: M = − p ′ + p ′ q p ′ − p ′ qq − q . This is an ergodic Markov chain whose invariant probability distribution is given by the nor-malized eigenvector of the eigenvalue 1, given by: qp ′ + q − p ′ qp ′ − p ′ qp ′ + q − p ′ q . PREAD OF MOBILE VIRUSES 7
This Markov chain therefore approach an equilibrium state with a probability of having an in-fected object equal to p ′ − p ′ qp ′ + q − p ′ q . We take this value as the mean-field approximation for thefraction of the infected objects: f MF = p ′ − p ′ qp ′ + q − p ′ q . As p ′ depends on the fraction of the infected objects itself, after a few elementary steps we obtaina transcendental equation for f MF : f MF = − q − (1 − q )(1 − p ) f MF d . (5.1)Please note that besides assuming a perfect uniform distribution of agents and also a perfectuniform distribution of infected agents, we also assumed f d to be an integer, which of courseis another approximation. Moreover in mean field theory there is always a chance of gettinginfected (the Markov chain is actually really ergodic).To compute the epidemic threshold in mean field theory, we start by observing that equation(5.1) always has a solution f =
0. So we are in the epidemic regime it there is another solution f = f MF >
0, for given values of p , q and d .To study the existence of solutions to (5.1) we consider the intersection of the graph of: φ ( f ) = − q − (1 − q )(1 − p ) f d with the bisectrix of the first quadrant with 0 ≤ f ≤ p , q and d (0 ≤ p ≤ ≤ q ≤ d ≥ φ ′ ( f ) > φ ′′ ( f ) < φ (0) = φ ( f ) <
1. Thereforeif φ ′ (0) > f = f MF ∈ (0 , φ ′ (0) ≤ f =
0. So the condition φ ′ (0) = q = d log − p + d log − p , with the epidemic thriving if q < q and extinguishing if q > q .Contrary to the findings of other authors, the data obtained by the simulation doesn’t seem toshow a dependence of the number of infected agents at equilibrium, or of the epidemic threshold,exclusively by the mere ratio p / q , as happens in di ff erent, typically continuous-time, models.The epidemic threshold q ( p , d ) is only approximately linear for small values of p and d , whichis what we expect to matter, heuristically, if we were to take a sort of continuum time limit of ourmodel. In fig. 1 we plotted the epidemic threshold q ( p , q ) for selected values of d . ALBERTO BERRETTI AND SIMONE CICCARONE
6. R esults of the simulation and future work
The observed fraction of infected agents at equilibrium f ∞ depends on all the three parameters:the infection probability p , the disinfection probability q and the density of agents d . Thereappear to be a value q ∗ ( p , d ) such that if q > q ∗ then f ∞ = q < q ∗ then f ∞ ,
0, as themean field theory predicts. q ∗ is increasing both in p and in d , as it can be easily expected. q ∗ isagain, as predicted by mean field theory, not linear in p , and so there’s no “epidemic threshold” depending simply on the ratio p / q . In fig. 2 we see some plots of q ∗ ( p , d ) for selected values of d . The empirical results are qualitatively similar to the predictions of mean field theory, but thereare some quantitative discrepancies which are stronger for small densities of infected agents. Webelieve that the discrepancies are mostly due to the fact that, ultimately, the mean field modelis an ergodic Markov chain while the real model, which we simulate, is not actually ergodic asthere is a state (no infected agents at all) which is attracting. Simply said, in mean field theory,where we consider only one agent, it can always get infected, while in the real model when thereare no longer any infected agents the infections has no chance to reignite itself. Also, when in thereal model the density of infected agents is small enough the chance to interact with one of thefew remaining infected agents is practically negligible and unless q is extraordinarily small theinfections dies out fast. Note also that in our model the probabilities of infection and disinfection p and q are actual probabilities of events happening upon intersection of the trajectories of theagents, not the infection and disinfection frequencies (which are observable random variablesand not parameters of the model).Concluding, we proposed a model for the propagation of a malware epidemic between mobileagents moving randomly on a plane, regular lattice. From the purely mathematical point ofview changing the dimension of the lattice would probably mean a lot, since it would impactthe probability of intersection of the random walks, but we fail to see a practical application forhigher dimensional lattices. It would be very interesting anyway to change the topology of theenvironment in which the agent move: for example, the agents could be constrained by havingmalware spreading along a graph of connections which is more general than a simple squarelattice. This would rise the interesting problem of finding an optimal containment strategy for theepidemic (or even just a better one) by modulating the probabilities of infection and disinfectiondepending on the topological properties of the graph. Acknowledgments.
We thank the Department of Mathematics of the University of Tor Vergatafor kindly providing all the computing resources needed for this work.
PREAD OF MOBILE VIRUSES 9 R eferences [1] F-Secure, Bluetooth-Worm:SymbOS / Cabir , , re-trieved Oct. 26th, 2013.[2] Symantec, SymbOS.Commwarrior.I , , retrieved Oct. 26, 2013.[3] The Spatial Node Distribution of the Random Waypoint Mobility Model; Bettstetter, Christian; Wagner, Chris-tian; Mobile Ad-Hoc Netzwerke, 1. deutscher Workshop über Mobile Ad-Hoc Netzwerke WMAN 2002[4] Epidemic Routing for Partially Connected Ad Hoc Networks; Vahdat, Amin; Becker, David; Duke University;2000[5] Modeling epidemic spreading in mobile environments; Mickens, James W.; Noble, Brian D.; WiSe ’05 Pro-ceedings of the 4th ACM workshop on Wireless security; 2005[6] Agent-Based and Population-Based Simulation: A Comparative Case Study for Epidemics; Ja ff ry, S. Waqar;Treur, Jan; Proceedings of the 22th European Conference on Modelling and Simulation, ECMS’08. EuropeanCouncil on Modeling and Simulation; 2008[7] Epidemic spread in mobile Ad Hoc networks: determining the tipping point; Valler, Nicholas C.; Prakash, B.Aditya; Tong, Hanghang; Faloutsos, Michalis; Faloutsos, Christos; NETWORKING’11 Proceedings of the10th international IFIP TC 6 conference on Networking - Volume Part I; 2011[8] D ipartimento di I ngegneria C ivile ed I ngegneria I nformatica , U niversit ` a di T or V ergata , R oma , I taly E-mail address : [email protected] D ipartimento di I ngegneria C ivile ed I ngegneria I nformatica , U niversit ` a di T or V ergata , R oma , I taly E-mail address : [email protected] (a) d = .
3. (b) d = . d = .
7. (d) d = . d =
1. (f) d = d =
5. (h) d = igure
1. Epidemic threshold given by mean field theory q ( p , q ) for selected values of d . PREAD OF MOBILE VIRUSES 11 (a) d = .
3. (b) d = . d = .
7. (d) d = . d =
1. (f) d = d =
5. (h) d = igure
2. Empirical epidemic threshold q ∗ ( p , q ) for selected values of dd