[PDF] Less is More: A privacy-respecting Android malware classifier using Federated Learning

Abstract

In this paper we present LiM ("Less is More"), a malware classification framework that leverages Federated Learning to detect and classify malicious apps in a privacy-respecting manner. Information about newly installed apps is kept locally on users' devices, so that the provider cannot infer which apps were installed by users. At the same time, input from all users is taken into account in the federated learning process and they all benefit from better classification performance. A key challenge of this setting is that users do not have access to the ground truth (i.e. they cannot correctly identify whether an app is malicious). To tackle this, LiM uses a safe semi-supervised ensemble that maximizes classification accuracy with respect to a baseline classifier trained by the service provider (i.e. the cloud). We implement LiM and show that the cloud server has F1 score of 95%, while clients have perfect recall with only 1 false positive in >100 apps, using a dataset of 25K clean apps and 25K malicious apps, 200 users and 50 rounds of federation. Furthermore, we conduct a security analysis and demonstrate that LiM is robust against both poisoning attacks by adversaries who control half of the clients, and inference attacks performed by an honest-but-curious cloud server. Further experiments with MaMaDroid's dataset confirm resistance against poisoning attacks and a performance improvement due to the federation.

Full PDF

LLess is More: A privacy-respecting Android malwareclassiﬁer using Federated Learning

Rafa G´alvez , Veelasha Moonsamy , and Claudia Diaz imec-COSIC ESAT/KU Leuven, Belgium, E-mail: [email protected] Radboud University, The Netherlands, E-mail: [email protected] imec-COSIC ESAT/KU Leuven, Belgium, E-mail: [email protected] Abstract

Android continues to dominate the mobile operatingsystem market and remains the most popular choiceamongst smartphone users. Consequently, Androidremains an attractive target for malware authors andas such, the mobile platform is still highly prone toinfections caused by malicious applications. To tacklethis problem, malware classiﬁers leveraging machinelearning techniques have been proposed, with varyingdegrees of success. In fact, it can be observed thatfor machine learning models to produce good results,they often need to rely on a large, diverse set of fea-tures – which are indicative of apps installed by users.This, in turn, raises privacy concerns as it has beenshown that features used to train and test machinelearning models can provide insights into user’s pref-erences. As such, there is a need for a decentralized,privacy-respecting Android malware classiﬁer whichcan protect users from both malware infections andthe misuse of private, sensitive information stored ontheir mobile devices.To ﬁll this gap, we propose LiM – a malware clas-siﬁcation framework which leverages the power ofFederated Learning to detect and classify maliciousapps in a privacy-respecting manner. Data aboutnewly installed apps is kept locally on the users’ de-vices while users beneﬁt from the learning processfrom each other, and the service provider cannot in-fer which apps were installed by each user. To re- alize such classiﬁer in a setting where users cannotprovide ground truth (i.e. they cannot tell whetheran app is malicious), we use a safe semi-supervisedensemble that maximizes the increase on classiﬁca-tion accuracy with respect to a baseline classiﬁer theservice provider trains. We implement LiM and showthat the cloud has F1 score of 95%, while clients haveperfect recall with only 1 false positive in ¿100 apps,using a dataset of 25K clean apps and 25K maliciousapps, 200 users and 50 rounds of federation. Fur-thermore, we also conducted a security analysis todemonstrate that LiM remains robust against poi-soning attacks.

In 2019, Google reported that there are 2.5 billionactive Android devices [7] – almost a decade after itsﬁrst launch. Google’s popular mobile operating sys-tem (OS), Android, is expected to continue to holda tight grip on the mobile market share for the yearsto come [8]. This phenomenon, however, comes ata cost. Since its release, Android has been known tobe an attractive, sought-after target for malware pro-liferation, and unfortunately, this still holds true tilldate [33]. Additionally, the fact that Android appli-cations (apps) can be installed from both the oﬃcialapp store, i.e. Google Play and third-party stores,and the plethora of device manufacturers involved1 a r X i v : . [ c s . CR ] J u l ake it challenging to deploy security measures thatcan scale easily.The Android research community has worked to-wards novel solutions to thwart malware propagation.This eﬀort has always been a constant, on-going armsrace. One common approach that has proven quitesuccessful so far is the application of machine learn-ing (ML) algorithms for malware detection and clas-siﬁcation. ML solutions can be grouped under thefollowing three categories: (i) cloud-based, (ii) client-based and (iii) hybrid, i.e. a combination of (i) and(ii). In cloud-based solutions [29, 40], the ML modelsare supplied with large sets of features that implic-itly reveal users’ app preferences which can then beleveraged to conduct targeted advertising. While forclient-based solutions [3], the ML models are knownto compute predictions on the device itself; this ap-proach is often resource consuming and results inhigh false positives as the models are not learningabout new data as time goes by. Lastly, hybrid so-lutions [34, 4] provide users with more ﬂexibility asapps that are ﬂagged as suspicious on the device canbe pushed to the cloud for further analysis.In general, it cannot be denied that ML algorithmsare eﬀective at detecting malware. However, of no-table concern, is the amount of information requiredfor the ML models to produce good results and theassociated repercussions on user’s privacy. More con-cretely, for the models to produce reasonable detec-tion accuracies, it can be observed that the higherthe number of raw features available for training andtesting, the better are the results. Unfortunately, asdemonstrated by Song et al. [35], ML models are ca-pable of memorizing and leaking detailed informa-tion about datasets. This observation coupled withthe fact that apps installed on a mobile device arehighly representative of a user’s behavior, includingtheir personal preferences, political views, etc. posea huge threat to user’s privacy. Motivation

In 2017, Google introduced

Google PlayProtect [2], a service that Google uses to scan appsinstalled on Android users’ phones. A year later, thecompany published further details about the inner-workings of Google Play Protect [36, 24], which in-cludes application of machine learning techniques atscale, and using app and Google Play data as data sources. It was also then revealed that Google notonly detects malicious apps installed from the GooglePlay store, but also from third-party stores. Thereare several privacy implications with Google’s ap-proach due to the insights that can be derived fromthe vast amount of information Google can poten-tially learn about apps that have been installed bothfrom the oﬃcial and third-party app stores. Thisalso bring forth the argument of free market andwhether Google should hold such power over its An-droid users.Therefore, to address the aforementioned concernsand shortcomings, we investigate the following keyquestion:

How can we build a decentralized Androidmalware classiﬁer that is privacy-respecting?

In this paper, we present LiM – a framework thatleverages the power of Federated Learning (FL) to(i) decentralize a malware classiﬁer, and (ii) respectusers’ privacy. State of the art FL models [39, 38, 27]allow users to keep their testing data locally while thelearning process is done collaboratively to improveperformance, i.e. users train their client models byproviding ground truth on the delivered predictions,while a service provider aggregates the parameters ofall models. LiM extends the traditional FL techniqueto the semi-supervised ML paradigm [12], enablingthe application of FL in settings where users cannotprovide ground truth, as is the case with correctlyrecognizing malicious apps. Semi-supervised modelsallow us to use both the labeled data of the cloud andthe unlabeled data of the clients; the former trainsfully supervised models and shares them with clientsto be retrained with their testing data.We validate the design by implementing LiM andmeasured its performance and its resilience againstpoisoning attacks. We carry out experiments usinga dataset of 25K malware apps and 25K clean apps,simulating federations of 200 clients over 50 rounds.The results show that the cloud can reach 95% F1score, and clients has as few as 1 false positive. Ad-ditionally, if faced with a strategic adversary whosegoal is to perform a poisoning attack by controlling50% of the clients, LiM showed that the remaininghonest clients are successfully able to correctly iden-tify the targeted, poisoned app. Thus, defeating theattack and ensuring that the global model’s predic-2ions do not get aﬀected.Our contributions:1. We present a ﬁrst, comprehensive design andimplementation of a privacy-respecting Androidmalware classiﬁer.2. We demonstrate an eﬀective way to combineFederated Learning and semi-supervised ensem-ble learning to enhance malware detection accu-racy and privacy at the same time.3. We conduct a security analysis to illustrate therobustness of LiM against poisoning attacks.4. In the spirit of open science, we make our codeavailable at redacted for review .The rest of the paper is organized as follows: in sec-tion 2, we provide background knowledge about Fed-erated Learning, semi-supervised learning and An-droid malware classiﬁcation. Section 3 describes thethreat model of LiM and in section 4, we elaborate onsafe semi-supervised Federated Learning and how itis implemented in LiM. Section 5 provides the detailsof the LiM architecture and its associated buildingblocks, followed by the empirical results and securityanalysis in section 6. Section 7 presents a discussionbased on the empirical results and avenues for futurework together with key related work in section 8 andconcluding remarks in section 9.

Federated Learning (also referred to as collaborativelearning) is a technique that allows a machine learn-ing algorithm to be trained in a distributed settingusing a client-server architecture. Clients (i.e. mo-bile devices) train their own local models and sendthe resulting parameters to the cloud. In turn, thecloud aggregates the received parameters and pushesthem back to the clients so that they can improvetheir performance. Thus, the cloud service providerdoes not have access to the raw client data, which iskept secret by the clients. One way to set up an FL-based system is to dis-tribute the architecture of a supervised classiﬁer (typ-ically a deep neural network) to clients, which thentrain them using labels provided by their own users[15]. The weights of the network are then aggregatedby the cloud by taking their average and pushed backto the clients so that further iterations of local train-ing can improve upon them.A limitation of vanilla FL is that clients need toprovide ground truth to train the local classiﬁers [12].While applications such as predictive typing can ben-eﬁt from this approach (since users know what theywant to write), others, for e.g. malware classiﬁcationcannot. We propose a solution to this problem usingsafe semi-supervised learning algorithms.

Semi-supervised learning (SSL) aims to use unlabeledinformation together with a labeled dataset to train aclassiﬁer. SSL algorithms exploit the fact that label-ing data can be diﬃcult and expensive, while collect-ing and learning labels from raw data has become eas-ier with the commoditization of internet access andthe plethora of apps installed on smartphones. Oneof the main challenges for the success of an SSL algo-rithm is to ensure it indeed learns useful informationfrom unlabeled samples, as there is no ground truthfor the algorithm to compare its predictions with.

Safe SSL addresses this challenge by assuring thata minimal baseline performance is always achieved,i.e. that unlabeled information does not worsen theperformance of another (possibly fully supervised)classiﬁer. A well-performing strategy to achieve safeSSL is to use an ensemble of learners that, combinedthrough a set of learned weights, are likely to outper-form the baseline model [18, 17].An example of this kind of classiﬁer is SAFEW [17].Its goal is to maximize the worst-case performancegain of a set of base learner (i.e. base classiﬁer), withrespect to the baseline classiﬁer, assuming the correctprediction can be realized by the convex combinationof the base learners [17]. The learning task is to ﬁndtheir associated weights by solving a minimizationproblem whose constraints can embed prior knowl-edge. Equation 1 shows the formal description of the3roblem. Given a set of n base learners and a baselineprediction y , ﬁnd a set of weights α i for base learnerpredictions y i such that performance with respectto ﬁnal predictions l ( y, (cid:80) ni =1 α i y i ) is better thanwith respect to baseline prediction l ( y , (cid:80) ni =1 α i y i )as measured by loss function l . For this guaranteeto hold in a worst case scenario, we use a maximinformulation: • maximize over the ﬁnal predictions: max y ∈ H u • the worse case performance gain:min α ∈ M l ( y , (cid:80) ni =1 α i y i ) − l ( y, (cid:80) ni =1 α i y i )where H = {− , +1 } , u is the number of unlabeledsamples, and M is a convex set from which weightsare drawn. M can be tuned using domain knowledge,although it is not necessary.max y ∈ H u min α ∈ M l ( y , n (cid:88) i =1 α i y i ) − l ( y, n (cid:88) i =1 α i y i ) (1) In the Android operating system (OS), apps are dis-tributed as Android application package (APK) ﬁles.These ﬁles are simple archives which contain byte-code, resources and metadata. A user can install oruninstall an app (thus the APK ﬁle) by directly in-teracting with the smartphone. When an Androidapp is running, its code is executed in a sandbox. Inpractice, an app runs isolated from the rest of the sys-tem, and it cannot directly access other apps’ data.The only way an app can gain access is via the media-tion of inter-process communication techniques madeavailable by Android. These measures are in place toprevent the access of malicious apps to other apps’data, which could potentially be privacy-sensitive.Since Android apps run in a sandbox, they notonly have restriction in shared memory usage, butalso to most system resources. Instead, the AndroidOS provides an extensive set of Accessible Program-ming Interfaces (APIs), which allows access to systemresources and services. In particular, the APIs that give access to potentially privacy-violating services(e.g., camera, microphone) or sensitive data (e.g.,contacts) are protected by the Android PermissionSystem [10]. Developers have to explicitly mentionthe permissions, that require user’s approval, in the

AndroidManifest.xml ﬁle (hereon referred to as theManifest ﬁle).Besides permissions, the Manifest ﬁle also includesinformation about the app components [11] , suchas activities, services, broadcast receivers and con-tent providers. An activity is the representation of asingle screen that handles interactions between userand apps.

Services are components that run in thebackground of the operating system to perform long-running operations while a diﬀerent application isrunning in the foreground.

Broadcast receivers re-spond to broadcast messages from other applicationsor the system. They allow an app to respond tobroadcast announcements outside of a regular userﬂow. A content provider manages a shared set ofapp data and stores them in the ﬁle system. It alsosupplies data from one app to another on request.It is worth noting that the information present in aManifest ﬁle is not obfuscated and can be extractedvia static analysis. It is to the app developer’s bestinterest to not obfuscate the ﬁle as it would result inbreaking the functionalities of the app, and therefore,rendering it useless. In section 6, we provide furtherdetails about the features used by our proposed clas-siﬁer, LiM, to conduct malware detection.

There are several proposals for machine learning clas-siﬁers that can detect malicious APKs targeting An-droid. We divide them in three categories: central-ized, local and hybrid.Centralized approaches use a cloud classiﬁer to pre-dict if an app is malicious or clean. Cloud-based ap-proaches can accurately predict big testing datasetsthanks to the advance feature engineering a cloudinfrastructure can handle [29]. Both static and dy-namic analysis can be performed, for e.g. such astaking into account the API call graphs of the appsand behavioral characteristics of an app during exe-cution.4ocal approaches install an already trained clas-siﬁer on the user’s device. Due to the constrainedresources available to the classiﬁer, the feature setand the detection algorithm must be considerablymore lightweight than in centralized approaches [3].Lightweight dynamic analysis can be performed to-gether with static analysis (for e.g. features fromManifest ﬁle).Hybrid approaches combine local and cloud mod-els. A ﬁrst screening of the app is performed on thedevice itself using a lightweight feature set, and ifnecessary more features are collected and sent to thecloud to verify the prediction [34, 4].Our proposal, LiM, aims to perform as well as cen-tralized solutions and protect user information as per-formed by malware detection approaches conductedlocally on the device, and thus, eﬀectively combiningboth approaches in a single prediction step.

Inspired by [32], we describe the threat model of LiMin terms of its attack surface, its trust model, thecapabilities of the adversary and his goals.The aim of our proposed framework is to conductmalware classiﬁcation in a decentralized manner us-ing learners that are trained with data that abidesby the data minimization principle. Therefore, in ourthreat model we distinguish between two diﬀerent ad-versaries (referred to as Adversary 1 and Adversary2 below) who have the following goals:1. Compromise integrity : adversary is successfulat poisoning the federation rounds in order totrigger speciﬁc apps to be misclassiﬁed, as pre-sented in [5].2. Compromise privacy : adversary is able to learnprivacy-sensitive information about the trainingset of the user, as presented in [21].Adversary 1 targets the core of the service pro-vided by a malware classiﬁer, i.e. bypass the detec-tion mechanism of the system. His goal is to changethe model so that it misclassiﬁes a speciﬁc maliciousapplication. In a federated setting, he can change the model by disguising himself as a user of the systemand submitting specially crafted models to the fed-eration. Further, we assume he has control over themalicious application, and that it can be tweaked sothat the model is close to confusing it with a cleanapp. In this scenario, as a trust model, we assumethat users trust their own devices and the serviceprovider.Adversary 2 aims to compromise the privacy of theuser. To do so, he tries to learn information aboutthe apps that users have installed on their devices, fore.g. the app names, categories, device usage patterns,etc. In a federated setting, we are interested in a pas-sive global attacker, as described in [28], that residesat the service provider’s side and subsequently, inferinformation about the training apps of the clients us-ing the models that are uploaded to the cloud. Inthis scenario, we assume that users only trust theirown devices.Both adversaries have white-box access to the ser-vice provider model (including architecture, featureset and hyper-parameters) in each federation round.Adversary 2 also has white-box access to the modelsof all clients in each round, while adversary 1 doesnot know the hyper-parameters of the honest usersmodels.The attack surface can thus be interpreted as themodels of the clients participation in the federation.We assume the data collected by the service providerand the users has not been tampered with, and hasbeen pre-processed correctly.

In this section, we provide further details about howthe concept of FL can be extended to the semi-supervised paradigm. In particular, we present ourarguments on how these two techniques are com-plementary in providing us with the necessary tech-nical building blocks to implement a decentralized,privacy-respecting malware classiﬁer.Traditionally, FL employs a decentralized ap-proach to train a neural model. Instead of upload-5ng data to servers for centralized training, clientsaggregate their local data and share model updateswith the global server. Such distributed approachhas been shown to work with unbalanced datasetsand data that are not independent or identically dis-tributed across clients. Furthermore, FL’s success isdependent on properly labeled data which can thenbe passed on to train supervised learning models.For the purpose of our work, we cannot rely onusers assigning correct labels on the client’s side, as itcannot be guaranteed that they will correctly identifymalicious apps. Therefore, we adopt semi-supervisedmethods that allow federated learning to train lo-cal models without user supervision. Labeled datais kept by the service provider, and clients use theirunlabeled samples to update the parameters of theirsemi-supervised model stored locally.Furthermore, we leverage the practical beneﬁts ofsafe SSL, as described in section 2.2, to ensure thatmodels trained by the clients are useful, i.e. they donot introduce confusion (via incorrect labels) in thefederation but provide at least a high enough baselineperformance.In the case of LiM, the federation happens acrossthe weights of the base learners, which clients esti-mate using their unlabeled testing datasets. The ser-vice provider then collects all client weights and ag-gregates them in a similar fashion as it would do withthe weights of e.g. a deep neural network (DNN). Itis important to note that the number of base learn-ers is much lower than the number of neurons in aDNN – LiM compresses client data even more, tak-ing advantage of the training process that the serviceprovider performs on the base learners. We see thisfeature as a defense mechanism against privacy at-tacks (cf. section 3), as client updates will not besparse anymore.Moreover, in our proposed architecture, the ser-vice provider plays a greater role than in the clas-sical, supervised FL in order to compensate for thelack of ground truth in the clients. It also selectsthe architecture and the feature sets of the diﬀer-ent learners, as well as which learner will be used asbaseline. Furthermore, LiM can provide protectionagainst integrity attacks (cf. section 3) by compar-ing the weights of the clients with those generated through its own unlabeled dataset.

Terminology . To improve readability of the remain-ing sections, we provide the reader with our workingdeﬁnition of key terminologies that we will rely on forthe rest of the paper. • Client : ML model that resides locally on theuser’s mobile device • Cloud : global ML model which is present at theservice provider’s side, i.e. a trusted entity • SAFEW : an ensemble of classiﬁers • Baseline learner : bare minimum performancefor an individual SAFEW classiﬁer • Base learner : individual algorithm that formsthe ensemble learning

Initialization phase : We assume the serviceprovider has access to a ground truth (labeled)dataset and testing (unlabled) dataset. On theclient’s side, we assume users want to scan theirinstalled apps (in a privacy-respecting manner) forpresence of malware. LiM can be incorporated in thepackage installer of an Android OS and runs as priv-ileged background service on each client’s phone.To implement the scheme explained in section 4, weapply the SAFEW classiﬁer introduced in section 2.2.

Round 0 of FL : In the ﬁrst step of the federation,the cloud (i.e. service provider) trains a set of base-line and base learners using its labeled dataset, andestimates a set of weights for the base learners us-ing its unlabeled data. In step 2, clients receive thetrained learners in order to (step 3) estimate theirown SAFEW weights using their own testing data(i.e. their installed apps). Then clients use theseaverage weights to classify their installed apps (step5). Users then complete their federation round bysending their client weights to the cloud (step 6). Toaggregate them, the cloud ﬁrst averages all the client6eights (step 7) and ﬁnally compute the median be-tween these averaged client weights and the weightsof its own SAFEW computed in step 1 (step 8). Fi-nally, the cloud sends the federated weights to theclients to initiate a new round of federation. Thisprocess is depicted in ﬁgure 1.

Round 1 of FL (and beyond) : Once Round 0 iscompleted, the service provider has the option of re-applying steps 1-2 at any point during the federationrounds; however, we did not consider this in our ex-perimental evaluation. Alternatively, after applyingstep 3, using the cloud’s own weights, the client thenaverage them with its own weight, as depicted in step4. Steps 5-9 are then applied as described in Round0. It is worth noting that in LiM the service providerdoes not send its own weights in the initial round soas not to mix-up with the client weights. In laterrounds, new federated weights are computed as us-ing both clients’ and cloud’s weights. In order toaggregate all client weights, we compute their aver-age; then, the service provider computes the medianbetween the average client weights and its own. Thedesign and rationale behind this construction is veryconservative on purpose in order to counteract in-tegrity attacks, critical for the performance of a de-centralized malware classiﬁer.Moreover, the individual SAFEWs use the hingeloss function to estimate their weights from unlabeleddata. While SAFEW supports diﬀerent loss func-tions, they also show that hinge loss allows SAFEWto ﬁnd the optimal prediction y using equation 2. y = sign ( n (cid:88) i =1 α i y i ) (2)where n is the number of base learners, y i the pre-dictions of base learner i , and α i its weight.We can also use diﬀerent classiﬁers as SAFEWlearners, as the optimization algorithm only usestheir predictions to compute their weights. Thereare two main criteria to keep in mind when selectingalternative learners: • They must provide a reasonable performance byon their own, e.g. over 90% F1 score. • Their predictions must complement each other,i.e. the learners must be heterogeneous. If thereis one clearly strong learner, SAFEW will justcopy its predictions (i.e. its weight will be 1).Additionally, domain knowledge can guide theaforementioned selection process, and it is possibleto further constraint the set of possible weights M (cf. section 2.2) to reﬂect for e.g. the conﬁdencethat the designer has on each learner relative to eachother. It is, however, important to note that LiMdoes not need a lot of domain information for itssetup. Weights can be learnt from data without anydomain knowledge, and there are no assumptions onthe distribution of the testing dataset (e.g. no priorknowledge on the classes base rate).In LiM, the service provider makes sure that theSAFEW ensemble distributed to the clients can per-form better than the baseline, then testing it withits own labeled dataset and discarding those combi-nations of learners whose F1 score is lower than thebaseline’s. In section 6 we compare the performanceof standard learners to decide which combination isthe most beneﬁcial for malware detection. We empirically evaluate LiM as a federated malwareclassiﬁer by simulating 200 clients and a single ser-vice provider for 50 federation rounds, running as aparallelized (across clients) Python program on a 4Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz coresusing 7 . We use the AndroZoo dataset [1] to obtain 25K cleanapps, which were selected from the top 3 most popu-lar stores (Anzhi, Appchina and Google Play Store)in the dataset as of October 2018. As for the malwaresamples, we collected 25K samples from the AndroidMalware Genome project [41] and the Android Mal-ware Dataset project [37, 19]. We pick the latestversion of apps, removing duplicates within the samestore. It is possible that the same app is published intwo diﬀerent stores as diﬀerent versions, but we con-sider they are eﬀectively diﬀerent apps as develop-8able 1: Number of features per category. Inthe top 500, there are 55 features that belongto two categories: 54 of them can be declaredas permissions or hardware components, the 55th(com facebook facebookcontentprovider) can be ei-ther an activity or a content provider.100 200 500declared permissions 65 73 97activities 19 80 232services 3 13 71intent ﬁlters 0 0 0content providers 1 3 10broadcast receivers 6 25 85hardware components 24 24 24ers may include diﬀerent functionality for particularstores.From each app we extract the manifest featuresproposed in Drebin [3]. While dynamic analysis canprovide greater performance, it is both more resourceintensive and more easily obfuscated. We then trans-form the statically extracted features into a vector ofbinary values indicating the presence of a feature (e.g.a speciﬁc permission) in the manifest of an app. Fi-nally, we select the top 100, 200, and 500 out of 370Kfeatures using the chi-squared (chi2) test, which mea-sures the correlation between a given feature and theclass attribute. To summarize them, table 1 showshow many features belong to each of the Drebin cat-egories.Out of the 50K apps, we randomly sample a train-ing set of 10K apps for training (all models) and use32K apps for testing the cloud and 8K to test theclients, with a ˜1K overlap between testing sets.We simulate several rounds of federation as de-scribed in section 5. In the ﬁrst round, clients havea set of 96 preinstalled apps extracted from an An-droid Pie emulator whose manifests have at least onepermission. In later rounds, the clients install upto 5 apps drawn randomly from the client testingdataset, using a binomial distribution with bias 0.6to randomize the number of apps. Each app will bea malware sample with probability 0.1. Moreover, to model the fact that users install popular apps muchmore frequently than others, we create two sets of 50apps based on the presence of popular features in themalware and clean datasets, and make clients drawapps from these sets with probability 0.8.

To make a fair evaluation of LiM in a setting whereusers encounter many more clean apps than mali-cious (i.e. where classes are highly imbalanced) weuse the F1 score, computed as F1 = 2 ∗ ( precision ∗ recall ) / ( precision + recall ). Precision measures howmany positive predictions (true positives + false pos-itives) were actual positives (true positives), whilerecall measures how many actual positives (true pos-itives + false negatives) were classiﬁed as positive(true positives). Since users typically install manymore clean apps than malicious apps, it is easy forLiM to achieve high recall by predicting many pos-itives at the expense of precision, which is not ac-counted for in other popular metrics like accuracy.In order to better understand this balance betweenprecision and recall across SAFEW conﬁgurations,we also report the raw number of false positives. Endusers can be sensible to small diﬀerences in the num-ber of false alarms, even if the F1 score of two ver-sions of LiM using diﬀerent conﬁgurations of learnersis very similar. Table 2 shows the F1 score and the number of falsepositives (FP) of clients using diﬀerent sets of base-line and base learners, and averaged across experi-ments. The best performance, i.e. F1 score of 77.4%is achieved using kNN as baseline and 200 features,thanks to the low number of false positives. BaselineSVM also achieves high F1 score of 61% using 200features.Figure 2 shows the evolution of LiM using KNNas baseline. The ﬁrst round of the federation bringsa signiﬁcant improvement in performance for LiM,and then the F1 score slowly grows similarly to how9able 2: Comparison of client average performance across LiM, SAFEW, and diﬀerent baselines. Experi-ments are carried out for 50 rounds of federation using the top 100, top 200 and top 500 features as per thechi2 test. FP F1 (%)Classiﬁer Baseline LiM SAFEW Baseline LiM SAFEWBaseline CentralizedSAFEW to estimate the performance when clients sub-10it the feature vectors of their apps directly to thecloud, i.e. when no privacy is provided. We can seethat LiM always matches the F1 score of the central-ized SAFEW and that the number of false positives isthe lowest using kNN with 200 features, which is con-sistent with the results of the clients. Regardless ofthe conﬁguration used, the F1 score remains around96%.As we discussed in section 5, we assume thecloud SAFEW improves performance over its base-line. Thus, table 2 does not show client performancefor most of the conﬁgurations where RFs are base-lines, as table 3 shows that the F1 scores of the cloudSAFEW in round 0 is lower than the F1 score of thebaselines.

The cloud and the clients compute their LiM predic-tions in 3 steps:1. Build a local SAFEW. New weights are com-puted using testing data.2. Average local and federated weights3. Compute new predictions using the new weightsFor step 1, the cloud takes 13 seconds and an indi-vidual client at the beginning of the federation spends0.1 seconds. Step 2 cost is negligible. Step 3 however,takes almost the same time as step 1: step 1 spendsonly 1 /

10 of the time computing new weights, whichin step 3 is not necessary. Thus, clients spent 0.2 sec-onds computing the LiM predictions, while the cloudspent 26 seconds.Note that the implementation was not optimizedat all: at the very minimum, the base predictionsfrom step 1 can be reused in step 3, making the totaltime closer to the time of step 1.Regarding the training of the base learners, weobserve each model taking 12.7s and the ﬁrst cloudSAFEW prediction spending 13 seconds.

In section 3 we considered two diﬀerent adversaries:one that aims to poison the federation in order to make a speciﬁc malware app be classiﬁed as clean,and another that wants to learn about the apps usershave installed.Federated learning provides a defense mechanismagainst privacy attacks. It hides the raw features ofthe installed apps, making it more diﬃcult for a ser-vice provider to infer information about them. How-ever, the submitted models may still leak informationthat can be used to infer e.g. if a single app has beenused to train them [21]. The attack presented in [21]relies on the fact that updates to the client mod-els may change only a few of the model parameters,i.e. which speciﬁc parameters can reveal informationabout the apps used in that federation round.In our setting, updates to client models only changethe weights of the base learners. It is safe to assumethat the number of base learners of SAFEW is sig-niﬁcantly smaller than the number of weights in e.g.a deep neural network. As the information availablefor the adversary to perform the membership infer-ence is greatly reduced, we assume the probability ofsuccess is low.While the architecture of LiM can be seen as a mit-igation against privacy attacks, the decentralizationof the training process directly aﬀects, in a positivemanner, the integrity of the system. An adversarycan freely participate in the federation by owninga subset of the clients, and subsequently having aninﬂuence on the overall federation through his ownclient models.To analyze LiM’s resilience against this kind of at-tacks, we focus on the impact a strategic adversaryhas. We assume he controls over 50% of the clients,which he uses to make LiM misclassify a speciﬁcmalware app from the client testing dataset (whichhe can modify) as clean. To make this app unde-tectable to LiM (i.e. a false negative), he will craftthe weights of his malicious clients so that his alliedbase learners have an honest majority in the dif-ferent layers of the federation. A base learner is anally of the adversary if it classiﬁes the malware appas clean. We assume there is at least one ally in theLiM conﬁguration.We formalize the problem in equations 3 through 7.Let w be the honest weights of a malicious client,and w (cid:48) the poisoned weights. The adversary goal is11able 3: Comparison of cloud average performance across LiM, SAFEW, and diﬀerent baselines. Experi-ments are carried out for 50 rounds of federation using the top 100, top 200 and top 500 features as per thechi2 test. FP F1 (%)Classiﬁer Baseline Centralized SAFEW LiM SAFEW Baseline Centralized SAFEW LiM SAFEWBaseline to make them as similar as possible, to be as stealthyas possible. Equation 3 expresses this goal.minimize w (cid:48) (cid:107) w − w (cid:48) (cid:107) (3)To maximize the chances of poisoning the federa-tion, weights need to take into account multiple con-straints. First, they must add up to one. Let b be thenumber of base learners; then the ﬁrst constraint tothe optimization problem is depicted in equation 4. b (cid:88) i =1 w (cid:48) b = 1 (4)Second, for the compromised client to misclassifythe targeted app, the weights of the classiﬁers thaterr in favour of the adversary must account for anhonest majority. Let M be the indices of those alliedclassiﬁers in the SAFEW ensemble; then equation 5 expresses the local constraint : b (cid:88) i =1 w (cid:48) i > . , { i | i ∈ M } (5)Third, the cloud will average the weights of all theclients. Thus, the averaged weights of the allied clas-siﬁers must also hold an honest majority. Since theadversary does not have access to the weights of thehonest clients, we can approximate it by averagingthe honest weights of the malicious clients. Let w c be the weights of the client c ; then equation 6 ex-presses the clients constraint .1 N + 1 N (cid:88) c =1 b (cid:88) i =1 ( w ci + w (cid:48) i ) > . , { i | i ∈ M > . } (6)Finally, the cloud will compute the federatedweights by averaging its own weights with the averageof the clients. Assuming the weights of the cloud are12nown, we can make the same honest majority acrossallied classiﬁers to hold by approximating it with theguessed average in equation 6. Let w ∗ be these av-eraged weights and w cloud , the weights of the cloud;then equation 7 expresses the cloud constraint .12 b (cid:88) i =1 ( w cloudi + w ∗ i ) > . , { i | i ∈ M > . } , (7)Each round, the adversary will try to solve thisproblem and submit the poisoned weights. If no so-lution is found, then he relaxes the problem by ﬁrstdropping the cloud constraint, and then the clientconstraint. The adversary will always ﬁnd a way tomeet the local constraint.All malicious clients install the same app targetedby the adversary. To bound the chances of success,we assume the app is crafted so that there is at leastone allied base learner, and ensuring their cumulativeweights lie between 0.3 and 0.4. This range makes itpossible for the adversary to win, but assumes he can-not make an app that is misclassiﬁed by the weightedmajority of base learners.We compare how LiM gets aﬀected by the attackunder diﬀerent conﬁgurations of learners.Table 4 shows that poisoning has virtually no ef-fect in the LiM cloud when using KNN as baseline(F1 score always around 96%). However, conﬁgu-rations with baselines LR and SVM see their perfor-mance drop with respect to SAFEW. It turns out theadversary is able to poison the honest clients usingSVM as baseline every second round, causing the F1score of the cloud to jump between 95.3% and 96.2%.We did not observe a stabilization in either directionduring the 50 rounds of our simulations. In the caseof baseline LR, clients perform slightly worse thanSAFEW in the initial rounds, but the average diﬀer-ence between F1 scores of LiM clients and SAFEWsdrops as rounds advance. This makes the cloud per-form slightly worse on average, but reach almost thesame score by round 50 (96.1%).Table 5 compares the average client performancefor honest LiM clients with baseline, SAFEW and ad-versarial (i.e. poisoned) clients. Clients using a base-line SVM with 200 features drop their performancewith respect to baseline, due to their average F1 score jumping from 0.2 to 0.6 every other round. Interest-ingly, as rounds advance, the success of the poisoningdiminishes as the clients grow their F1 scores. WhenLR is used as baseline, the average performance ofthe LiM clients is lower than SAFEW, but as roundsadvance it converges towards the same SAFEW 80%F1 score (last round).We now look into the evolution of LiM clients us-ing kNN as baseline classiﬁer, as it is the one thatstill improves its performance with respect to base-line and SAFEW. Figure 4 shows clients improve overtime and is able to perform better than baseline eventhough SAFEW does not. It also shows how theF1 score of the adversarial clients approximates theworst among the three, suggesting that the adversaryis indeed poisoning LiM so that the submitted param-eters are close to a potential honest set of parameters.Figure 4: F1 score of 100 honest + 100 maliciousclients, using 200 features with baseline kNN (n=3).SAFEW (i.e. no-lim) improves on baseline, and LiMimproves on SAFEW.Figure 5 shows the evolution of the false positives.Since the adversary only wants to trigger a false neg-ative, the number of false positives in adversarialclients can indeed be minimized to behave as closeto an honest client as possible.13able 4: Comparison of cloud average performance across LiM, SAFEW, Centralized SAFEW and diﬀerentbaselines when 50% of the clients are adversarial. Experiments are carried out for 50 rounds of federationusing the top 100, top 200 and top 500 features as per the chi2 test. FP F1 (%)Classiﬁer Baseline Centralized SAFEW LiM SAFEW Baseline Centralized SAFEW LiM SAFEWBaseline

Table 5: Comparison of client average performance across LiM, SAFEW and diﬀerent baselines when 50%of the clients are adversarial. Experiments are carried out for 50 rounds of federation using the top 100, top200 and top 500 features as per the chi2 test.FP F1 (%)Classiﬁer Baseline LiM SAFEW Adv. client Baseline LiM SAFEW Adv. clientBaseline

The evaluation results show that LiM can enableclients to learn from each other without users pro-viding ground truth. The selection of learners for theindividual SAFEWs proves to be an important taskto carry out in order to reach this goal, as results varyaccordingly to the strength of the individual learnersand the variance across their predictions. We observethat random forests provide high F1 score by them-selves, but LiM can outperform them by ﬁne tun- ing the weights of weaker but complementary learn-ers, as explained in section 5. The best results wereachieved when all learners inﬂuenced the ﬁnal pre-dictions, without copying those of for e.g. a randomforest. Even though we used standard simple learn-ers to focus on the federation itself, we expect LiMto greatly beneﬁt from domain knowledge insightsregarding the use of speciﬁc algorithms and architec-tures. Future work is needed to verify this hypothesis,possibly in other domains where users cannot provideground truth themselves.14igure 5: False positives of 200 clients with 200 fea-tures, using kNN (n=3) as base learner.Our results also suggest that using relatively fewfeatures can provide better performance, rather thanmaking use of large feature sets. While 100 featuresseem to be insuﬃcient for LiM to perform well, wesee that 200 is equal or sometimes better than using500 features. While this type of data minimization isconsidered irrelevant when clients do not share theirtesting samples, it beneﬁts both runtime performanceand resilience against privacy attacks, which can ex-ploit the sparseness of the shared models with respectto the training samples.Regarding performance results, we highlight that1) the cloud LiM matches and sometimes outper-forms a privacy-invasive Centralized SAFEW (as wellas multiple baseline classiﬁers), i.e. LiM does notsacriﬁce performance for privacy, and 2) clients cangreatly reduce the number of false positives thanks tothe federation rounds. The drastic improvement gainin the ﬁrst round of federation and contribution of theclient weights towards the weights of the cloud leadsus to believe that increasing the number of clients inthe simulations may reproduce this eﬀect along fur-ther rounds, as there will be more information com-ing from the averaged client weights. Simulating LiMat scale can help clarify the relationship between thenumber of clients and LiM performance. Interestingly, LiM can avoid virtually any privacyloss with respect to a centralized SAFEW as thefederation of the clients provides enough informa-tion to arrive to the same weights of the privacy-invasive model. Even though the diﬀerences betweenthe weights of the cloud without federation and theweights of the centralized SAFEW can be relativelyhigh (e.g. to 0.1 in a single weight, 0.37 vs 0.47),clients can provide enough information to equalizethem.We envision LiM as a system that can be prac-tically deployed in real-world smartphones. Whilemarket interests may dissuade powerful organizationslike Google to deploy LiM in stock Android, we be-lieve third-party ROMs diﬀerentiating themselves bybeing more privacy conscious (e.g. /e/ ) can developthe client app as a privileged service executed uponthe installation of (one or more) apps. This practi-cal implementation would be able to perform staticanalysis over the manifest ﬁles of the newly installedapps, while trusting a neutral LiM service providerwith the resulting client models. Limitations & Future work:

We expect futurework to address the following limitations of the cur-rent formulation of LiM, namely: • Malware family-wise classiﬁcation is out of thescope in this paper. Our ﬁgures do not take intoaccount the speciﬁc characteristics of the mal-ware apps installed by clients. • Conducted an extended analysis to study theevolution of LiM with diﬀerent data distribu-tions among cloud and clients models. We onlysimulate 50 rounds of federation with a staticset of users installing apps and pre-deﬁned pa-rameters to simulate probabilities of installingmalware, clean and popular apps.We believe LiM is equipped to act as a self-evolvingsystem that requires a very low maintenance over-head for the service provider, beneﬁtting from theever-enlarging set of apps users install on their de-vices and the geographic distribution of malware toprevent malware to disseminate in large numbers. https://e.foundation/ Related work

ML techniques to detect mobile malware have beenextensively investigated, leveraging a few character-istics of the mobile applications (for example, callgraphs [23], permissions [16], or both API callsand permissions [14]), and the results obtained werepromising. Classiﬁcation approaches have also beenproposed to model and approximate the behaviors ofAndroid applications and discern malicious apps frombenign ones. The detection accuracy of a classiﬁca-tion method depends on the quality of the features(for example, how speciﬁc the features are [25]).In [22], Milosevic et al. implemented an app thatdetects malware locally through a pre-trained SVMclassiﬁer. They explicitly created a permission basedmodel to detect malware. There is no informa-tion to recreate the model, although the model it-self is available as part of the OWASP Seraphimdroidproject [30]. The dataset they used has 200 benignand 200 malicious apps; however, since 2015, theirdataset is no longer publicly available. For more re-cent state of the art related work in this area, we referthe reader to [29].

In general, classiﬁcation models perform better whenan abundance of data is available for training. How-ever, more data often means additional noise intro-duced in the model and more importantly, sensi-tive information can be inferred from the dataset,as shown in [26, 35]. Additionally, there are severalwork on feature selection that shows that choosinga selected set of features that are most representa-tive of the dataset can provide better accuracy [25].The downside with this approach, however, is thatthe signature database needs to be updated on a reg-ular basis – this downside is addressed in our pro-posed classiﬁer, LiM. Karbab et al. [13] generate aﬁngerprint based on three diﬀerent sub-ﬁngerprints,and uses it to detect if the APK has malicious pay-load belonging to a certain family. One of the sub- ﬁngerprints is the metadata ﬁngerprint, which reliesmostly on permission lists. They encode this ﬁnger-print in a vector of 256 bits, each bit correspondingto one Android permission. They do not detect appIDs though, but only detect malware families. Panet al. [31] show apps hog permissions without neces-sarily using them.

McMahan et al. [20] proposed federated learning as away to distribute the training process of a deep neu-ral network. This distribution allows users to keepdata in their devices while a service provider aggre-gates and distributes the locally trained model acrossusers, minimizing the amount of data collected bythird parties about users.One of the challenges of FL is to ensure that an ad-versary does not tamper with the training samples,i.e. by adding perturbed data into the model to trig-ger a misclassiﬁcation [5]. Such an attack fall underthe category of poisoning attack. In [6], Biggio et al.presented one of the early works about the impactof poisoning attack on malware clustering. They be-gan by proposing an open-source malware clusteringtool. To demonstrate its eﬀectiveness, they imitatedan attacker who is able to add crafted, poisoning sam-ples with the goal of downgrading the proposed’s toolperformance. Chen et al [9] conducted a more exten-sive study on poisoning attack strategies where theyconsidered two diﬀerent types of threat models, anddemonstrated how an adversary can successfully in-troduce poisoned data into a neural network.

We have presented LiM, the ﬁrst federated learningalgorithm that works successfully without user super-vision, making use of safe semi-supervised learningtechniques. We demonstrate its utility as a malwaredetection system where users keep their apps secretfrom the service provider while detecting most of themalicious apps they install without raising many falsealarms.LiM is resistant against a strategic adversary that16rafts a malicious app in order to bypass the detec-tion mechanism and compromises 50% of the clients.Thanks to the greater role of the service provider,LiM can defend against this attack while keeping itsoverall performance intact.While we carried out its evaluation in the malwaredetection domain, LiM can be potentially applied toany problem where users cannot provide ground truthlabels to the clients models, but would still bene-ﬁt from the performance improvements of federatedlearning and its privacy properties.

References [1] Kevin Allix, Tegawend´e F. Bissyand´e, JacquesKlein, and Yves Le Traon. AndroZoo: Collect-ing Millions of Android Apps for the ResearchCommunity. In

Proceedings of the 13th Interna-tional Conference on Mining Software Reposito-ries

Proceedings2014 Network and Distributed System SecuritySymposium , San Diego, CA, 2014. Internet So-ciety.[4] Saba Arshad, Munam A Shah, Abdul Wahid,Amjad Mehmood, Houbing Song, and HongnianYu. Samadroid: a novel 3-level hybrid malwaredetection model for android operating system.

IEEE Access , 6:4321–4339, 2018.[5] Eugene Bagdasaryan, Andreas Veit, YiqingHua, Deborah Estrin, and Vitaly Shmatikov.How To Backdoor Federated Learning. arXiv:1807.00459 [cs] , July 2018.[6] Battista Biggio, Konrad Rieck, Davide Ariu,Christian Wressnegger, Igino Corona, GiorgioGiacinto, and Fabio Roli. Poisoning behavioral malware clustering. In

Proceedings of the 2014Workshop on Artiﬁcial Intelligent and SecurityWorkshop , AISec ’14, page 27–36, New York,NY, USA, 2014. Association for Computing Ma-chinery.[7] Russell Brandom. There are now 2.5billion active android devices. TheVerge, May 2019. .[8] Melissa Chau and Ryan Reith. Smart-phone market share. International DataCorporation. – accessed 29May 2020.[9] Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu,and Dawn Song. Targeted backdoor attackson deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 , 2017.[10] Android Developers. https://developer.android.com/guide/topics/permissions/overview – accessed on 28 June 2019.[11] Google. Application fundamentals. https://developer.android.com/guide/components/fundamentals .[12] Peter Kairouz, H. Brendan McMahan, BrendanAvent, Aur´elien Bellet, and Mehdi Bennis etal. Advances and Open Problems in FederatedLearning. arXiv:1912.04977 [cs, stat] , December2019.[13] ElMouatez Billah Karbab, Mourad Debbabi,and Djedjiga Mouheb. Fingerprinting Androidpackaging: Generating DNAs for malware detec-tion.

Digital Investigation , 18:S33–S45, August2016.[14] TaeGuen Kim, BooJoong Kang, Mina Rho,Sakir Sezer, and Eul Gyu Im. A multimodal deeplearning method for android malware detectionusing various features.

IEEE Transactions onInformation Forensics and Security , 14(3):773–788, 2018.1715] Jakub Koneˇcn`y, Brendan McMahan, and DanielRamage. Federated optimization: Distributedoptimization beyond the datacenter. arXivpreprint arXiv:1511.03575 , 2015.[16] Jin Li, Lichao Sun, Qiben Yan, Zhiqiang Li,Witawas Srisa-an, and Heng Ye. Signiﬁcantpermission identiﬁcation for machine-learning-based android malware detection.

IEEE Trans-actions on Industrial Informatics , 14(7):3216–3225, 2018.[17] Yu-Feng Li, Lan-Zhe Guo, and Zhi-Hua Zhou.Towards Safe Weakly Supervised Learning.

IEEE Transactions on Pattern Analysis andMachine Intelligence , pages 1–1, 2019.[18] Yu-Feng Li and Zhi-Hua Zhou. Towards mak-ing unlabeled data never hurt.

IEEE transac-tions on pattern analysis and machine intelli-gence , 37(1):175–188, 2014.[19] Yuping Li, Jiyong Jang, Xin Hu, and XinmingOu. Android malware clustering through ma-licious payload mining. In

International Sym-posium on Research in Attacks, Intrusions, andDefenses , pages 192–214. Springer, 2017.[20] Brendan McMahan, Eider Moore, Daniel Ram-age, Seth Hampson, and Blaise Aguera y Arcas.Communication-Eﬃcient Learning of Deep Net-works from Decentralized Data. In

Artiﬁcial In-telligence and Statistics , pages 1273–1282, April2017.[21] Luca Melis, Congzheng Song, EmilianoDe Cristofaro, and Vitaly Shmatikov. ExploitingUnintended Feature Leakage in CollaborativeLearning. In , volume 1, San Francisco,CA, US, May 2019.[22] Nikola Milosevic, Ali Dehghantanha, and Kim-Kwang Raymond Choo. Machine learning aidedAndroid malware classiﬁcation.

Computers &Electrical Engineering , 61:266–274, July 2017.[23] Omid Mirzaei, Guillermo Suarez-Tangil, Jose Mde Fuentes, Juan Tapiador, and Gianluca Stringhini. Andrensemble: Leveraging api en-sembles to characterize android malware fami-lies.

Proceedings of 14th ACM ASIA Confer-ence on Computer and Communications Secu-rity (ACM ASIACCS 2019) , 2019.[24] Damien Octeau Mo Yu and Android Se-curity & Privacy Team Chuangang Ren.Combating potentially harmful applica-tions with machine learning at google:Datasets and models. Android Devel-opers Blog, November 2018. https://android-developers.googleblog.com/2018/11/combating-potentially-harmful.html .[25] Veelasha Moonsamy, Jia Rong, and ShaowuLiu. Mining permission patterns for contrastingclean and malicious android applications.

Fu-ture Generation Computer Systems , 36:122–132,July 2014.[26] Arvind Narayanan and Vitaly Shmatikov. Ro-bust de-anonymization of large datasets (how tobreak anonymity of the netﬂix prize dataset).

University of Texas at Austin , 2008.[27] Milad Nasr, Reza Shokri, and AmirHoumansadr. Comprehensive privacy anal-ysis of deep learning: Passive and activewhite-box inference attacks against centralizedand federated learning. In , pages739–753, 2019.[28] Milad Nasr, Reza Shokri, and AmirHoumansadr. Comprehensive Privacy Analysisof Deep Learning: Passive and Active White-box Inference Attacks against Centralized andFederated Learning. In , volume 1, pages1021–1035, San Francisco, CA, US, May 2019.[29] Lucky Onwuzurike, Enrico Mariconti, Panagio-tis Andriotis, Emiliano De Cristofaro, GordonRoss, and Gianluca Stringhini. Mamadroid:Detecting android malware by building markovchains of behavioral models (extended version).18

CM Transactions on Privacy and Security(TOPS)

Proceedings onPrivacy Enhancing Technologies , 2018(4):33–50,October 2018.[32] Nicolas Papernot, Patrick McDaniel, AruneshSinha, and Michael P. Wellman. SoK: Securityand Privacy in Machine Learning. In , pages 399–414, April 2018.[33] Raj Samani. Mcafee mobile threat report.Technical report, McAfee, 2020. .[34] Andrea Saracino, Daniele Sgandurra, GianlucaDini, and Fabio Martinelli. MADAM: Eﬀectiveand Eﬃcient Behavior-based Android MalwareDetection and Prevention.

IEEE Transactionson Dependable and Secure Computing , 15(1):83–97, January 2018.[35] Congzheng Song, Thomas Ristenpart, and Vi-taly Shmatikov. Machine learning models thatremember too much. In

Proceedings of the 2017ACM SIGSAC Conference on Computer andCommunications Security (CCS 2017) , pages587–601, 2017.[36] Sai Deep Tetali. Keeping 2 billion an-droid devices safe with machine learn-ing. Android Developers blog, May2018. https://android-developers.googleblog.com/2018/05/keeping-2-billion-android-devices-safe.html .[37] Fengguo Wei, Yuping Li, Sankardas Roy, Xin-ming Ou, and Wu Zhou. Deep ground truth analysis of current android malware. In

Interna-tional Conference on Detection of Intrusions andMalware, and Vulnerability Assessment , pages252–276. Springer, 2017.[38] Qiang Yang, Yang Liu, Tianjian Chen, andYongxin Tong. Federated machine learning:Concept and applications.

ACM Transactionson Intelligent Systems and Technology (TIST) ,10(2):1–19, 2019.[39] Xin Yao, Tianchi Huang, Chenglei Wu, Ruix-iao Zhang, and Lifeng Sun. Towards faster andbetter federated learning: A feature fusion ap-proach. In , pages 175–179, 2019.[40] Hanlin Zhang, Yevgeniy Cole, Linqiang Ge, Six-iao Wei, Wei Yu, Chao Lu, Genshe Chen, DanShen, Erik Blasch, and Khanh D. Pham. Scanmemobile: A cloud-based android malware anal-ysis service.

SIGAPP Appl. Comput. Rev. ,16(1):36–49, April 2016.[41] Yajin Zhou and Xuxian Jiang. Dissecting an-droid malware: Characterization and evolution.In , pages 95–109. IEEE, 2012.

A List of features

A.1 Top 100 features • android hardware camera • android hardware camera autofocus • android hardware microphone • android hardware screen landscape • android hardware screen portrait • android hardware touchscreen multitouch • android hardware touchscreen multitouch distinct • android hardware wiﬁ19 android permission access assisted gps • android permission access coarse location • android permission access coarse updates • android permission access ﬁne location • android permission access gps • android permission access location • android permission access location extra commands • android permission access wiﬁ state • android permission call phone • android permission change wiﬁ state • android permission get tasks • android permission install packages • android permission kill background processes • android permission mount unmount ﬁlesystems • android permission process outgoing calls • android permission read call log • android permission read contacts • android permission read logs • android permission read phone state • android permission read proﬁle • android permission read settings • android permission read sms • android permission receive boot completed • android permission receive sms • android permission restart packages • android permission send sms • android permission system alert window • android permission use credentials • android permission write apn settings • android permission write contacts • android permission write external storage • android permission write settings • android permission write sms • android support v4 content ﬁleprovider • cn domob android ads domobactivity • com adfeiwo ad coverscreen sa • com adfeiwo ad coverscreen sr • com adfeiwo ad coverscreen wa • com adwo adsdk adwoadbrowseractivity • com airpush android deliveryreceiver • com airpush android messagereceiver • com airpush android pushads • com airpush android pushservice • com airpush android userdetailsreceiver • com android browser permission read history bookmarks • com android browser permission write history bookmarks • com android launcher permission install shortcut • com android launcher permission uninstall shortcut • com android vending billing • com bving img ag • com bving img rv • com bving img se • com facebook ads interstitialadactivity • com facebook facebookactivity • com facebook loginactivity • com google android c2dm permission receive20 com google android gms ads adactivity • com google android gms ads purchase inapppurchaseactivity • com google android gms analytics analyticsreceiver • com google android gms analytics analyticsservice • com google android gms analytics campaigntrackingreceiver • com google android gms analytics campaigntrackingservice • com google android gms appinvite previewactivity • com google android gms auth api signin internal signinhubactivity • com google android gms auth api signin revocationboundservice • com google android gms common api googleapiactivity • com google android gms gcm gcmreceiver • com google android gms measurement appmeasurementcontentprovider • com google android gms measurement appmeasurementinstallreferrerreceiver • com google android gms measurement appmeasurementreceiver • com google android gms measurement appmeasurementservice • com google android providers gsf permission read gservices • com google ﬁrebase iid ﬁrebaseinstanceidinternalreceiver • com google ﬁrebase iid ﬁrebaseinstanceidreceiver • com google ﬁrebase iid ﬁrebaseinstanceidservice • com google ﬁrebase messaging ﬁrebasemessagingservice • com google ﬁrebase provider ﬁrebaseinitprovider • com google update dialog • com google update receiver • com google update updateservice • com kuguo ad boutiqueactivity • com kuguo ad mainactivity • com kuguo ad mainreceiver • com kuguo ad mainservice • com mobclix android sdk mobclixbrowseractivity • com soft android appinstaller ﬁnishactivity • com soft android appinstaller ﬁrstactivity • com soft android appinstaller rulesactivity • com soft android appinstaller memberactivity • com soft android appinstaller questionactivity • com startapp android publish appwallactivity • net youmi android adactivity A.2 Top 200 features • android hardware camera • android hardware camera autofocus • android hardware camera front • android hardware microphone • android hardware screen landscape • android hardware screen portrait • android hardware touchscreen multitouch • android hardware touchscreen multitouch distinct • android hardware wiﬁ • android permission access assisted gps • android permission access coarse location • android permission access coarse updates • android permission access ﬁne location • android permission access gps • android permission access location • android permission access location extra commands • android permission access wiﬁ state • android permission call phone • android permission camera21 android permission change conﬁguration • android permission change network state • android permission change wiﬁ state • android permission clear app cache • android permission get tasks • android permission install packages • android permission kill background processes • android permission mount unmount ﬁlesystems • android permission process outgoing calls • android permission read calendar • android permission read call log • android permission read contacts • android permission read external storage • android permission read logs • android permission read phone state • android permission read proﬁle • android permission read settings • android permission read sms • android permission receive boot completed • android permission receive sms • android permission receive wap push • android permission record audio • android permission restart packages • android permission send sms • android permission system alert window • android permission use credentials • android permission vibrate • android permission write apn settings • android permission write calendar • android permission write contacts • android permission write external storage • android permission write settings • android permission write sms • android support v4 content ﬁleprovider • biz neoline android reader bookmarksandtocactivity • biz neoline android reader libraryactivity • biz neoline android reader neobookreader • biz neoline android reader textsearchactivity • biz neoline app core core application shutdownreceiver • biz neoline app core ui android dialogs dialogactivity • biz neoline app core ui android library crashreportingactivity • biz neoline test donationactivity • cn domob android ads domobactivity • com adfeiwo ad coverscreen sa • com adfeiwo ad coverscreen sr • com adfeiwo ad coverscreen wa • com adwo adsdk adwoadbrowseractivity • com adwo adsdk adwosplashadactivity • com airpush android deliveryreceiver • com airpush android messagereceiver • com airpush android pushads • com airpush android pushservice • com airpush android smartwallactivity • com airpush android userdetailsreceiver • com amazon device messaging permission receive • com anddoes launcher permission update count22 com android browser permission read history bookmarks • com android browser permission write history bookmarks • com android launcher permission install shortcut • com android launcher permission uninstall shortcut • com android vending billing • com biznessapps layout maincontroller • com biznessapps player playerservice • com biznessapps pushnotiﬁcations c2dmmessagesreceiver • com biznessapps pushnotiﬁcations c2dmregistrationreceiver • com bving img ag • com bving img rv • com bving img se • com chartboost sdk cbimpressionactivity • com elm lma • com elm lmr • com elm lms • com elm lmsk • com facebook ads audiencenetworkactivity • com facebook ads interstitialadactivity • com facebook customtabactivity • com facebook customtabmainactivity • com facebook facebookactivity • com facebook facebookcontentprovider • com facebook loginactivity • com feiwothree coverscreen sa • com feiwothree coverscreen sr • com feiwothree coverscreen wa • com google android apps analytics analyticsreceiver • com google android c2dm permission receive • com google android gcm gcmbroadcastreceiver • com google android gms ads adactivity • com google android gms ads purchase inapppurchaseactivity • com google android gms analytics analyticsreceiver • com google android gms analytics analyticsservice • com google android gms analytics campaigntrackingreceiver • com google android gms analytics campaigntrackingservice • com google android gms appinvite previewactivity • com google android gms auth api signin internal signinhubactivity • com google android gms auth api signin revocationboundservice • com google android gms common api googleapiactivity • com google android gms gcm gcmreceiver • com google android gms measurement appmeasurementcontentprovider • com google android gms measurement appmeasurementinstallreferrerreceiver • com google android gms measurement appmeasurementjobservice • com google android gms measurement appmeasurementreceiver • com google android gms measurement appmeasurementservice • com google android providers gsf permission read gservices • com google ﬁrebase iid ﬁrebaseinstanceidinternalreceiver • com google ﬁrebase iid ﬁrebaseinstanceidreceiver • com google ﬁrebase iid ﬁrebaseinstanceidservice • com google ﬁrebase messaging ﬁrebasemessagingservice • com google ﬁrebase provider ﬁrebaseinitprovider • com google update dialog • com google update receiver • com google update updateservice • com htc launcher permission update shortcut23 com klpcjg wyxjvs102320 browseractivity • com klpcjg wyxjvs102320 mainactivity • com klpcjg wyxjvs102320 vdactivity • com kuguo ad boutiqueactivity • com kuguo ad mainactivity • com kuguo ad mainreceiver • com kuguo ad mainservice • com majeur launcher permission update badge • com mobclix android sdk mobclixbrowseractivity • com nd dianjin activity oﬀerappactivity • com onesignal gcmbroadcastreceiver • com onesignal gcmintentservice • com onesignal notiﬁcationopenedreceiver • com onesignal permissionsactivity • com onesignal syncservice • com parse gcmbroadcastreceiver • com parse parsebroadcastreceiver • com parse pushservice • com paypal android sdk payments futurepaymentconsentactivity • com paypal android sdk payments futurepaymentinfoactivity • com paypal android sdk payments loginactivity • com paypal android sdk payments paymentactivity • com paypal android sdk payments paymentconﬁrmactivity • com paypal android sdk payments paymentmethodactivity • com paypal android sdk payments paypalfuturepaymentactivity • com paypal android sdk payments paypalservice • com sec android provider badge permission read • com sec android provider badge permission write • com soft android appinstaller ﬁnishactivity • com soft android appinstaller ﬁrstactivity • com soft android appinstaller rulesactivity • com soft android appinstaller services smssenderservice • com soft android appinstaller sms binarysmsreceiver • com soft android appinstaller memberactivity • com soft android appinstaller questionactivity • com software application c2dmreceiver • com software application checker • com software application main • com software application notiﬁcator • com software application oﬀertactivity • com software application showlink • com software application smsreceiver • com software application permission c2d message • com sonyericsson home permission broadcast badge • com sonymobile home permission provider insert badge • com startapp android publish appwallactivity • com startapp android publish fullscreenactivity • com startapp android publish overlayactivity • com tencent mobwin mobinwinbrowseractivity • com umeng common net downloadingservice • com uniplugin sender areceiver • com unity3d ads android view unityadsfullscreenactivity • com unity3d player unityplayeractivity • com unity3d player unityplayernativeactivity • com urbanairship corereceiver • com urbanairship push pushservice24 com vpon adon android webinapp • com waps oﬀerswebview • io card payment cardioactivity • io card payment dataentryactivity • net youmi android adactivity • net youmi android adbrowser • net youmi android adreceiver • net youmi android adservice • net youmi android appoﬀers youmioﬀersactivity • net youmi android youmireceiver • tk jianmo study bootbroadcastreceiver • tk jianmo study killpoccessserve ••