[PDF] Collective Intelligence: Decentralized Learning for Android Malware Detection in IoT with Blockchain

Abstract

The widespread significance of Android IoT devices is due to its flexibility and hardware support features which revolutionized the digital world by introducing exciting applications almost in all walks of daily life, such as healthcare, smart cities, smart environments, safety, remote sensing, and many more. Such versatile applicability gives incentive for more malware attacks. In this paper, we propose a framework which continuously aggregates multiple user trained models on non-overlapping data into single model. Specifically for malware detection task, (i) we propose a novel user (local) neural network (LNN) which trains on local distribution and (ii) then to assure the model authenticity and quality, we propose a novel smart contract which enable aggregation process over blokchain platform. The LNN model analyzes various static and dynamic features of both malware and benign whereas the smart contract verifies the malicious applications both for uploading and downloading processes in the network using stored aggregated features of local models. In this way, the proposed model not only improves malware detection accuracy using decentralized model network but also model efficacy with blockchain. We evaluate our approach with three state-of-the-art models and performed deep analyses of extracted features of the relative model.

Full PDF

11 IoTMalware: Android IoT Malware Detection basedon Deep Neural Network and BlockchainTechnology

Rajesh Kumar, WenYong Wang, Jay Kumar, Zakria, Ting Yang, Waqar Ali & Abubackar Sharif

Abstract —The Internet of Things (IoT) has been revolutioniz-ing this world by introducing exciting applications almost in allwalks of daily life, such as healthcare, smart cities, smart environ-ments, safety, remote sensing, and many more. The signiﬁcance ofAndroid IoT devices is bespoke due to ﬂexibility and hardwaresupport features. However, it is quite challenging to deal withincreasing malware attacks on android devices with every passingday. This challenge is magniﬁed further in the case of android IoTdevices due to their limited resources such as memory and power.Therefore, this paper proposes a new framework based on theblockchain and deep learning model to provide more security forAndroid IoT devices. Moreover, our framework is capable to ﬁndthe malware activities in a real-time environment. The proposeddeep learning model analyzes various static and dynamic featuresextracted from thousands of feature of malware and benign appsthat are already stored in blockchain distributed ledger. Themulti-layer deep learning model makes decisions by analyzingthe previous data and follow some steps. Firstly, it divides themalware feature into multiple level clusters. Secondly, it choosesa unique deep learning model for each malware feature set orcluster. Finally, it produces the decision by combining the resultsgenerated from all cluster levels. Furthermore, the decisionsand multiple-level clustering data are stored in a blockchainthat can be further used to train every specialized cluster forunique data distribution. Also, a customized smart contract isdesigned to detect deceptive applications through the blockchainframework. The smart contract veriﬁes the malicious applicationboth during the uploading and downloading process of Androidapps on the network. Consequently, the proposed frameworkprovides ﬂexibility to features for run-time security regardingmalware detection on heterogeneous IoT devices. Finally, thesmart contract helps to approve or deny to uploading anddownloading harmful Android applications.

Index Terms —Android Malware Detection, Blockchain, Deeplearning, Secure IoT Devices, smart contract

I. I

NTRODUCTION

The future wireless technologies such as ﬁfth-generationmobile phone networks (5G) and Internet of Things (IoT) arerevolutionizing this world by introducing innovative applica-tions to develop smart systems that can be only imagined inthe past, such as smart environment sensing, smart agricul-ture, smart drones, smart healthcare monitoring, autonomouscars, to mention a few. For developing such smart systems,heterogeneous electronic devices participate in a common

This work is supported by the National Natural Science Founda-tion of China under grant no. U2033212The authors are with theSchool of Computer Science & Engineering, University of ElectronicScience and Technology of China (e-mail: [email protected],[email protected], [email protected], [email protected], [email protected], [email protected], [email protected]. network to communicate with each other as illustrated inFigure 1. A wide range of advanced electronic IoT devicesare controlled with powerful Android platform which enablesthe integration of smart gadgets such as sensors, smartphones,smart watches, smart washing machines and many more. Suchelectronic devices including smartphones encourage people tostore and share their personal and conﬁdential information.At the same time, it make these devices become intensivetarget for malicious application developers to harm users dueto common Android platform [1], [2], [3], [4], [5], [6]. Thecrackers/attacker can exploit the android system by indulgingfake applications that will directly affect the user’s privacy andsecurity. It may pose a severe threat snooping on user’s datasuch as conﬁdential contracts, photos, contact information,location, account information, and passwords. Additionally,the malicious applications can produce adverse effects not onlyon the intended node but even can affect other linked deviceswith a shared network. About 0.7 million applications werereported as malicious and blocked by Google Play Store beforeuser downloading in the year of 2019 [7]. However, mostandroid app market does not provide a way to access whether amobile app is counterfeited or not. Besides, many users installapplications from anonymous sources and do not use antivirusapplication to protect from malicious and phishing attacks [8],[9]. Therefore, there exists an urgent need for an evolvedapproach and framework that can detect malware applicationstimely.Previous studies [10], [11], [12], [13], [14] proposed varioustechniques for malware detection on Android platform suchas signature mechanism [15], [16], access control [17], andsandbox [18], [19], [20]. Although, most of the techniques areefﬁcient and effective to some extent by considering differentconstraints. Recently, deep learning techniques gained signif-icant attention to solve the malware detection problem [10],[13], [21]. Mainly, previous algorithms are highly based oninitial feature extraction process such as convolutional neuralnetwork, LSTM. However, these techniques can not be directlyapplied to smartphones and IoT devices due to their limitedresources regarding memory, processing power limitations,and so on. For this purpose, we propose a novel technique tointegrate blockchain technology with deep neural networks inorder to resolve the limitations of previous malware detectiontechniques. Our approach enables direct implication for IoTdevices.Firstly, we consider the problem of train the deep learningmodel in the decentralized network for multiple features of a r X i v : . [ c s . CR ] F e b Figure 1. Integration of IoT devices connected on common network viaAndroid application platform.

Android malware detection. In this article, the multi-layerdeep learning model inspects the malware using the trainedmodel which are stored in the distributed ledger. The overallarchitecture of the deep learning model can be concludedinto of ﬁve steps: i) Selection of the important feature usingGINI information gain function, ii) Division of a dataset intodifferent clusters to obtain the fundamental data distributionfor a particular group of malware, iii) Generation of multipleclusters as a sub-tree of each tree cluster for the huge numberof features, and iv) Choosing the best deep learning model foreach cluster set to distribute the malware and benign from thecorresponding unique data distribution. V) Finally, aggregatethe latest weights using the previous trained model history.The second problem is to track the malware or harmfulapplication when users are downloading the app through theinternet. This article use blockchain technology to store thelatest malware information which is detected by the deeplearning model. The IPFS store the Android application andhashes of the apps are stored in the blockchain ledger. Afteruploading the Android app in the IPFS, deep learning modeltest (benign, malware) the app. Then the harmful applicationinformation store in the blockchain ledger. In this way, thesmart contract helps to approve or deny the harmful Androidapplications during the uploading and downloading process.The third problem is to design a low resources consumptionmodel. In this article, we aggregate deep learning weights andutilize IPFS technique to reduce the computational cost tothe network. The integration of the deep learning model andblockchain collects the new types of malicious features of theAndroid applications from the various sources to train the deeplearning model itself. It provides security and makes betterdetection of malware for Android IoT devices in a real-timeenvironment to protect the potential vulnerabilities attacks.The main contributions of our work are listed as follows:1) This paper proposes a framework that integrates deeplearning and blockchain for better malware detectionand information sharing regarding Android applicationsacross the network. 2) The enhancement in a multi-level deep learning model isproposed that can extract multiple type of malware fea-tures and the training task is distributed in the blockchainnetwork for better prediction.3) The customized smart contract is designed to providesecure downloading and uploading of Android applica-tions, along with providing alerting the user about themalicious activities.4) An extensive empirical analysis is conducted to provethe promising results of proposed approach by providingmulti-level deep learning and secure data sharing viablockchain.The remaining part of this paper is organized as follows:Section 2 presents the literature review of Android malwaredetection, and it discusses the static, dynamic, and hybridanalysis. Section 3 discusses the proposed framework basedon blockchain and deep learning. Next, section 4 analyzes theresults and provides a comparison with other work. Finally,we concluded our work.II. L

ITERATURE R EVIEW

This section quickly review the literature of Android appli-cations malware detection and feature extraction techniques.We divide this section into four parts i) Static analysis, whichcontains two approaches the ﬁrst approach is permission-based, and the second approach is API Call. ii) Dynamicanalysis that is used to extract real-time phone features, and iii)Hybrid analysis that combines the static and dynamic features.Final part provides a comparison between all techniques.

A. Static Analysis

Static analysis can check the application’s behavior withoutexecuting the app. Several machine learning techinques areproposed to classify benign apps and malicious apps [20], [22],[23] such as content based analysis that reduce the dimensionsof the content. The latest research [24], [25], [26], [27] ofstatic analysis based on the API calls and Permission featuresfor the Android malware for the malware detection are lesseffective when detecting malware [28], [29], [30], [31], [32],[33], [34], [35], [24]. However, low efﬁciency is demonstratedusing these methods for feature extraction. Our main focus todesign a multi-level deep learning model, which can support avarious kind of features and classify the malware and benigneffectively.

1) Permission-based analysis:

Android uses a permission-based security model to ensure that sensitive information of theuser is restricted, and the actual user only can access it. Indeed,permission is the most effective static feature because attackersapply for permission to reach their malicious goals. Before theapp gets installed, it asks for some requested permissions fromthe user. After permission granted, the app installs itself on thedevice. There are many approaches that extract permissionsfor malware detection [36], [37], [27], [38]. Wang et al. [37]proposed a methodology for analyzing the permission-basedon permission ranking, association rule and similarly based .It ﬁnds the permission groups using the correlation coefﬁcientand ranks each permission individually . Verma et al.,[39], [22],

Figure 2. Static feature extraction processFigure 3. Top risky permission [20] used the information gain algorithm of feature selectionto choose the best features from android apk packed ﬁles. Thisapproach relies on the characteristics of entropy and selects thehighest gain value as the top features. However, most studiesused the score ranking scheme. In this study, we rank the staticand dynamic features to ﬁnd out the pattern of risky featuresin the Android applications.

2) API calls:

API stands for Application Program Interface.API calls are used by the apps to interact with the Androidframework. Some works target API calls and use it as apromised feature to investigate malicious behaviour. Figure4 indicates several suspicious API calls, primarily used inmalware apps.

B. Dynamic Analysis

Moreover, dynamic analysis [40] was proposed to observeto real-time behavior of the phone to observe the dynamicbehaviors and features of applications. In this perspective, toanalysis, the dynamic behavior of malware activities, use theEmulator (Android Virtual Device) to extract the dynamic fea-tures such as API calls, Events/Action. There are several toolsto use dynamic analysis methods, including the Monkey andthe DroidBot tool. Using these tools, Dynalog is an essential

Figure 4. Suspicious API callsFigure 5. Traditional dynamic analysis features extraction and classiﬁcationprocess ﬁle used to input generation methods. It can extract the featuresof the API calls and system call information that reveals howmalware behaves, the features of the dynamic methodologycan be observed in [41], [42], [43], [44], [45] and detection ofunknown malware that shows similar behavior is also possibleusing these methods [43], [1], [46]. Also, the API call analysisand control ﬂow are the dynamic analysis methods. [47], [40],[48], [49]. The main difference between existing works andours is that our approach combines both static and dynamicanalysis methods with the blockchain [50], [51] and deeplearning to increase the detection rate and overcome machinelearning weakness. Furthermore, our approach is to distributethe malware information in the blockchain network that cannotify benign and malware Android apps at the installationtime.The process of feature extraction for the dynamic analysisshown in Figure 5. Many techniques are used in previousliterature for dynamic method analysis, Table I shows someimportant features based on the machine learning algorithm.

C. Hybrid Analysis

The hybrid analysis combines static analysis and dynamicanalysis. The static features are extracted without executingthe application. In contrast, dynamic features are extractedby an emulator or on the real device, which is time andresource-consuming. The hybrid analysis illustrates in Figure 6to combine the static and dynamic analysis. Some researchersfocus on hybrid analysis [52], [53], [16], [54], [55], but thesemethods are time-consuming. To solve this problem, we design

Table ID

YNAMIC FEATURES DETECTION METHODS

Ref Features Accuracy MachineLearning Models[41] System call 91.75% SignatureMatching[42] System call 81% K-Means[44] System call 88.2% Frequency[43] System call - Pattern matching[33] API call 97.6 KNN_M[45] Native Size 99.9% RF, SVMFigure 6. Hybrid feature extraction process the blockchain-based framework that equally distributed re-sources to all users. Our proposed technique is more sufﬁcientand less time consuming because of the distributed nature ofblockchain.

D. A Comparison of Static, Dynamic, and Hybrid Analysis

The comparison with respect to advantages and disadvan-tages of static and dynamic feature extraction can be seenin Table II for highlighting superiority of hybrid proposedapproach. III. SYSTEM MODELIn this article, we consider the problem of sharing thelatest malware features and train the deep learning model inthe decentralized network for multiple Android IoT devices.Due to limited resources and power consumption of Androiddevices. We design a multi-feature deep learning model whichsupport various features form the decentralized network. Thenwe are focusing the aggregate a previously trained modelwith new features sets from the latest apps which are up-loaded recently. The harmful application information storesin the blockchain network. Finally, we provide security ofthe android derives using a smart contract for retrieving theinformation of the malicious Android applications. Figure 7shows the architecture of the proposed framework. The stepsof proposed framework shown in below:1) The user uploads app to the network2) The version of the app will be stored in the InterPlane-tary File System (IPFS) .3) The deep learning model extracts the benign and mal-ware features from the uploaded app. More detail insection III-A .4) The updated deep learning model store in IPFS systemfor reducing the cost of blockchain. Additionally, we

Table IIA

COMPARISON OF STATIC , DYNAMIC , AND HYBRID ANALYSIS

Type features Advantages LimitationsStaticAnalysis

SingleCategory Easier to featureextracting Mimicry attackLow computational Code obfuscationLow accuracyMultipleCategories Easier to featureextracting Mimicry attackHigh Accuracy Hard to managemultiple featuresCode obfuscationHigh computing

DynamicAnalysis

SingleCategory Accuracy betterthen static Extraction offeatures is difﬁcultRecover codeobfuscation High computingstill not a betterchoiceMultipleCategories Accuracy betterthen static High computationRecover codeobfuscation More resourcesneeded

HybridAnalysis

Highest accuracy High complexityBetter resultscompared to staticand dynamicanalysis More resourcesutilizationTime consumption aggregate the model weights to reduce the computationaltask in the blockchain nodes. In section III-B5) The hash value of app and decision results obtained fromdeep learning model is stored in a distributed ledger.More details shown in section III-C6) During the downloading process of the app: i) the userwill send hash value to the network ii) the smart contractwill come into action to compare and verify the hashvalue of the downloaded app iii) Finally, it will notifythe user regarding the malicious or benign app. Moredetails shown in section III-DIn summarizing Figure 8 provide a more detailed descriptionof uploading or downloading from a developer and userperspective. When a user downloads the Android apps ordevelopers uploading the apps, the smart contract uses forsecure data uploading and check the harmful features of theapps automatically. The smart contracts can track maliciousapps from the decentralized network. Also, it can help tolearn the deep learning model itself and track the malwareapplication when users are downloading the app through theInternet. Additionally, blockchain technology can create atrust-less environment and guarantees the transparency andreliability of the distributed nodes.

Figure 7. Proposed framewrok based on of deep neural network and blockchainFigure 8. Flow of user and developer of Android malware detection usingblockchain

A. Deep learning model training

In this section, we proposed a deep learning-based modelbased on static and dynamic analysis. Figure 9 shows theoverall architecture of the deep learning model for static anddynamic analysis. In the ﬁrst phase, we combine static anddynamic features. The static features are extracted by decom-piling the Android apk ﬁle, and dynamic features are gath-ered from Droid Emulator(Android Virtual Environment). TheDroidBot Emulator generates the Dynalog ﬁle, and for staticfeatures, we use CSV ﬁle. In the second phase, select the staticand dynamic features from the CSV and Dynalog ﬁle and rankthe features using information gain function. The informationgain function score the features such as TelephonyManager;- >getDeviceId is 0.98. After that, we found the similarity amongthe features. In the third phase, deep learning evaluates theperformance of benign and malware applications and train theclassiﬁer. In the fourth phase, share the features informationof malware and benign in the blockchain distributed database for achieving real-time malware detection for Android IoTdevices.

1) Feature selection for hybrid malware detection:

Weused the feature importance property of the model. Featureimportance gives a score for each feature of data betweenzero and one. The higher the score is, the more important orrelevant is the feature towards the output variable. This scorehelps in choosing the most important features and drop theleast important ones for model building. Feature importanceis an inbuilt class that comes with tree-based classiﬁers. Theinformation gain (IG) was used to select important featureswith a high score to classify the data effectively[56]. Theinformation gain express in the 1, 2, and 3.equation

Gain ( A ) = Inf o ( D ) − Inf o A ( D ) (1) inf o ( D ) = − postotal log postotal − negtotal log negtotal (2)InfoGainRatio ( A ) = Gain ( A ) Inf o ( D ) (3)

2) Deep learning model training:

Our main aim is tocreate a deep learning based model that ensures that Androidmalware and benign are accurately classiﬁed. Furthermore,it detects Android malware from benign apps. The previouspaper discusses various techniques for malware detection [10],[11], [57], [13], [14]. These methods are highly efﬁcient,though; however, they can not be applied directly to mobileand the IoT devices. To improve the detection performanceof deep learning model, we design the multiple-level of deeplearning. Each deep learning model learns from speciﬁc fea-tures of Android malware data for a single group of malware.Finally, all groups of deep learning models combined andmake ﬁnal predication. We tested some deep learning modelsduring the training process, which include deep recurrentneural Network, Convolutional Neural Network, Fully FeedForward Network. One of then us the best for each cluster.

Figure 9. Static and dynamic analyis deep learning model with blockchain

Additionally, LSTM avoids the batch normalization vanishinggradient problem for multiple features. We combine the RNNand LSTM to achieve better performance in distinguishing themalware and benign application. In the ﬁrst step, we selectthe important feature using information gain function for thestatic and dynamic analysis. In the second stage, the datasetdived into different clusters that calculate the unique datadistribution. In the third stage, multiple clusters generated as asub-tree of each tree cluster for the huge number of features.In the fourth stage, the best deep learning classiﬁer is selectedto distribute the malware and benign from the unique datadistribution for every cluster. However, the proposed deeplearning model classiﬁes every cluster of each distinct featurefor the static and dynamic analysis. The use of multiple deeplearning models during the training phase reduces time andprovide better accuracy. Finally, our proposed model classiﬁesthe malware and benign. The use of multiple deep learningmodels during the training phase reduces time and providesbetter efﬁciency. Finally, our proposed model classiﬁes themalware and benign. The workﬂow of all stages shown inFigure 11, and deep learning model training is shown in 10.Therefore, our model improves the detection performance andefﬁciency of the traditional deep learning classiﬁer.Moreover, to save the computational power, the trainingtask is distributed in the blockchain network through theforward propagation and backward propagation. In the forwardpropagation, the input is passed through the blockchain decen-tralized network, and after processing the input, the outputis shared in the decentralized network. Then, in backwardpropagation, the weights of neural networks are shared tothe blockchain network to reduce the computational power.Therefore, the distribution of the training task reduces time and utilized decentralized resources over the network. Ad-ditionally, we simulate the training outputs in the decentral-ized network through the Proof-of-Work, Proof-of-Stake andDelegated-Proof-of-Stake. Proof-of-Work reduces the compu-tational power of the deep learning model. Delegated-Proof-of-Stake is using to vote the hash. it avoids the complex hashoperation. More precisely, state information of each node ina distributed network is taken as the dataset. The dimensionalmatrix ( M ) is the input of the deep neural network, and theaverage number of becoming the mining node in a term isthe capacity label. After training our network, we can get theaverage transaction number of the i th node as long as M i isinput. Finally, implementation of the deep neural network isaimed to learn itself from the huge volumes of data resourcesthrough the blockchain technology. The next section discussesthe blockchain technology. Figure 10. Deep learning model training steps.Figure 11. Proposed Android multi-feature deep learning model.

B. Aggregate deep learning weights from the blockchain net-work

This section aggregates the previously trained model withthe new information of the latest feature application features and updated the latest model in the IPFS to track the newharmful apps. The user uploads the app through the internet,the deep learning model identiﬁes the malware app throughcompute the gradients and send the updated weights to theglobal blockchain network. The smart contract shares the ag-gregated updated results. Moreover, when the user downloadsthe application the smart contract identify the harmful app.The process of combining the blockchain and deep learningtechnology shown in Algorithm 1.Neural networks are trained through (i) forward propagation(ii) backward propagation are considered to calculate layers’weights. In the forward propagation, the input is passedthrough F = f ( x, w ) = ¯ y, and to processing the code x is input and w parameter vector, the trained malware andbenign features set F = ( x i , y i ); i(cid:15)I for each devices ( x i , y i ) .The output weights are shared in the decentralized networkthrough the IPFS. The loss function of the training feature setis deﬁned as L ( F, w ) = loss . F is deﬁned as dataset and l is the loss function. loss = F (cid:80) ( x i , y i ) ∈ F l ( y i , f ( x i , w )) Then, in backward propagation, the updated weights of neuralnetworks are using stochastic gradient descent (SGD) deﬁnedas below equation: w t +1 ← w t − η ∇ w L (cid:0) F t , w t (cid:1) (4)As we can see in equation 4 , the learning rate is η , andthe i th is the iteration of the w t parameters. F t ⊆ F is theeach devises mini-batch training dataset.The above equation is use for single user. Moreover, to learnlocal model collaboratively and create a global model from theevery devices v ∈ V shown in equation 5 w t +1 ← w t − η (cid:80) v ∈ V ∇ w L ( F tv , w t ) | V | (5) C. Storing information about malware features in blockchain

We store the Android application hashes with the maliciousand benign features (static and dynamic) in the blockchaindistributed database. The structure of the storing informationabout malware features shown in Figure 12 and further de-scribes the attributes of the Figure 12 in Table III. Furthermore,the blockchain structure divided into two parts i) Block headerand ii) Block data. In the ﬁrst part of block header stores theversion number of apps, Markle root, hash values of all apps,and so on. The second part block data stores the all staticand dynamic features such as suspicious API, permission,events , calls, etc. The primary purpose to store the malwareinformation in the blockchain distributed database is to ensurethe security the identical hash values which can effectivelyprevent fraud such as de-compile and repacking Androidapplications by reverse engineering techniques. Therefore, noone can easily create counterfeit applications.Moreover, an existing system such as VirusTotal hasﬂaws in detecting fake Android mobile apps. The proposedblockchain framework we offer to remove these ﬂaws andrecognize Android fake/malicious applications. Furthermore,the blockchain includes actual information in a decentralized

Algorithm 1:

Aggregate deep learning weights fromthe blockchain network MD ← MobileDevices ; { F n } n ∈ [ N ] ← Malware Features ; w ← global weights ; L ( w, x ) ← Loss ; I ← iteration ; θ ← clip bound ; for i ∈ [ I ] do for md ∈ [ M D ] do sample malware and benign feature data setwith probability | f imd | | f md | ; end for x ∈ F th do gf imd ( x ) ← ∇ w i L (cid:0) w i , x (cid:1) ; gf imd ( x ) ← gd imd ( x ) / max (cid:18) , (cid:107) gd i ( x ) (cid:107) θ (cid:19) ; retrieves the weights or global model frompermissioned blockchain ; end gf imd ← (cid:80) x ∈ D imd gd imd ( x ) + MD (cid:16) , θ ρ MD (cid:17) ; executes IPFS model to aggregation and obtainupdated the IPFS model ; add the parameters of model as a transaction ; end gf imd ← MD (cid:16)(cid:80) n ∈ [ md ] gf imd (cid:17) ; w i +1 ← w i − η · gd i ; retrieves the current updated weights from IPFS, andaggregates the weights; broadcasts new malware information to other delegatesfor veriﬁcation, and collects all transactions into anew block; appends the block including the global model to thepermissioned blockchain; Table IIIB

LOCKCHAIN A TTRIBUTES

Keywords Size DeﬁnitionPre- Hash 32 bytes preceding block hash valueVersion number 4 bytes track the protocol or software updatesTimestamp 5 bytes records the time a blockTransaction_count 15 bytes number of malware results in thecurrent blockMerkle root 32 bytes it calculate the malicious codes whichdetected by blockNonce 15 byte randomly recognized as a formalblock malware blockchain database to increase the prediction per-formance of the malware and run-time detection of malwarewhen the user downloads and upload the Android app into thenetwork.

Figure 12. Blockchain data-store technique for multi-features of Androidmalware

D. Designing a Smart Contract to secure the Android devicesto check the harmful apps

This section describes the utilization of Ethererumblockchain and smart contract for veriﬁcation, tracking ofversioning history of Android apps, and further discusses thestoring mechanism of hash values in a distributed ledger. Theproposed smart contract can track different versions of appsand can provide continuous detection by broadcasting andsharing information regarding every new malicious app. Tostore an app on the blockchain network will be very expensiveand wasteful in terms of resources as most apps have relativelylarge sizes ranging from several megabytes (MB). That’s why,ﬁrstly, the uploaded app by a developer will be stored in theIPFS ﬁle system along with its version history, and furtheronly corresponding hash values of apk ﬁle will be stored inblockchain distributed ledger. Moreover, the use of the IPFSalso provides several other beneﬁts due to its peer-to-peernetwork feature and support regarding the tracking of theversioning history of every uploaded apk ﬁles. Furthermore,our design smart contract interact during the uploading anddownloading Android applications. It handles the Androidapplication IPFS version and the hash value of the application.It can approve or deny to upload harmful Android applicationsduring uploading/downloading. Finally, the malicious featuresare broadcast using smart contracts to all users across thenetwork. Figure 4 interacts between participating entities thatare deﬁned as developers, users, approves, deep learning, andthe smart contract.

Smart Contract :

All the interactions among the users anddevelopers are handled by the smart contract . It check the newuploaded applications and provide the information about themalware and benign. Also it store the new information aboutthe new apks. The smart contract interacts with developerand user to approve the application and notify the benign ormalware.

Methods:

Contracts are structures that deﬁne the essenceof the deal. Several contracts have requirementsthat only require a certain organization to executethem; others may be accessible to all participants.The strategies used in the smart contract aredirectly related to the effectiveness of the contract.

Modiﬁers:

Modiﬁers changes the behavior of the applicationfeatures. it can only deﬁne variables in this blockbefore execution. it can restrict the access to con-tract function according to malicious applications

Variables:

Variable holds a value and that value can changedepends on function call or conditions. Based onthe smart contract, variables can be able to storea speciﬁc data type.

Algorithm 2:

Smart contract approvers uploaded apk Contract is:

W aitF orCheckingM alwaree ; Devloper is:

ReadtT oU ploadAP ; Approva is:

W aitingT oSucessOrF ail ; if apkHashCheck(distributedLedger) then Contract is: sucessSign ; Devloper is: sucessP rovidedAP ; Approve = sucessApproval(If app is not Malware); else Contract is: denySign ; Devloper is: denyP rovidedAP ; Approve = denyApproval(If app is Malware); endAlgorithm 3: Smart contract approvers download apk apk ← DownlodedApp; apkHash ← ApplyHash(apk); Approve is:

W aitingT oSucessOrF ail ; if BlockChainLedger(apkHash) then Approve = sucessApproval(Malware informationnot found); else Approve = denyApproval; end IV. PERFORMANCE EVALUATIONIn this section, we discuss the experiment results of our pro-posed framework. It include the dataset, evaluation measures,results and comparison with other works. The proposed modelbased on deep learning algorithm and blockchain provides thestrong evidence of the results, which is obtained from theexperiments.

A. Dataset

The dataset that we used contains 18,850 normal Androidapplication packages and 10,000 malware android packageswith different features. It collected around 13,000 Androidapplication packages (. apk) as normal apps from different resources and 6971 malicious applications from known sourcessuch as DroidKin dataset [58], Android Malware GenomeProject [59] and AndroMalShare [60]. They extracted thepermissions at installation and run time after running thecollected Android application packages (. apk) using emulatorbluestack [61]. In this study, we used the new version oftheir dataset that contains 18,850 normal Android applicationpackages and 10,000 malware.

B. Experimental setup

In this paper, we extract dynamic and static features. Thedynamic analysis is done in real time devices to check the realtime performance of the network. we utilized 8 mobiles phoneswith different conﬁgurations , Android 10.0 , 6 GB RAM,Processor Kerin 980, 128 GB ROM. every smart phone processan average 400 apps daily. All phones contains sim card with4G network connection. The execution of the run time derivesare determined when chosen the input generation. Moreover,to analysis the dynamic behavior of malware activities, usethe Emulator (Android Virtual Device) to extract the dynamicfeatures such as API calls, Events/Action. After extracting thedynamic and static features, this paper combine and train themodel.

C. Evaluation Measures

We used Python; the programming language to conduct ourexperiment. In order to evaluate malware detection systemsefﬁciency true positive and false positive rate are used in [62],[63] TPR deﬁned as shown in equation 6 .

T P R = T p T p + F n (6)If True Positive (TP) is the sum of correctly recognizedmalware samples and False Negative (FN) represents thenumber of wrongly detected malware samples that are benign.The recognition rate is also known as TPR. Eq 7 is deﬁnedas false positive rate (FPR). [62]: F P R = F p F p + T n (7)The malware samples incorrectly identiﬁed, and true neg-ative (TN) is the number of positive samples. FPR is alsoreferred to as the false alarm rate. Overall Accuracy (ACC):Percentage of correctly identiﬁed applications which is shwonin equation 8: Accuracy = T p + T n T p + T n + F p + F n (8) D. Results Discussion

The detailed empirical study of our proposed research workcontains two observations. Initially, our key goal is to builda deep learning based model which enable to identify anddiagnose Android malware and benign applications. For thispurpose, we analyze static and dynamic features based onoverall information gain score in Figure 13-17. The analysis ofextracted features are observed to distiguish between benign and malware applications. At ﬁrst, inspected information gainbased on frequency count of features such as permissions,connections, intents of the static and dynamic features areexploited, as depicted in Figure 18-22. From the observedbehavior of mentioned features, we can conclude that API callsand services frequency ratio have inverse relationship betweenbenign and malware applications. Whereas speciﬁcally, asshown in Figure 22, frequnecy count of Intents is much less inbenign compared with malware applications. In contrast, theutilization of permission does not help to distiguish betweenbenign and malware due to same frequency count, as shownin Figure 19. Therefore, we exploit useful static and dynamicfeatures to construct a high quality model for training. ,QIR*DLQ 7HOHSKRQ\0DQDJHU!JHW'HYLFH,GFRPDQGURLGYHQGLQJ,167$//B5()(55(5DFWLRQ606B5(&(,9('7HOHSKRQ\0DQDJHU!JHW6XEVFULEHU,GDFWLRQ86(5B35(6(17PHWKRGV+WWS3RVW!LQLW!7HOHSKRQ\0DQDJHU!JHW/LQH1XPEHU:L 0DQDJHU!JHW&RQQHFWLRQ,QIRFRQWHQW&RQWH[W!ELQG6HUYLFH/MDYDXWLO7LPHU7DVN!LQLW!/MDYDLR)LOH2XWSXW6WUHDP!ZULWH3DFNDJH0DQDJHU!FKHFN3HUPLVVLRQ/DQGURLGQHW1HWZRUN,QIR!JHW6WDWH/MDYDLR)LOH!H[LVWVVHFXULW\0HVVDJH'LJHVW!JHW,QVWDQFH/DQGURLGFRQWHQW&RQWH[W!XQELQG6HUYLFHDFWLRQ3+21(B67$7(DFWLRQ3$&.$*(B$''('7HOHSKRQ\0DQDJHU!JHW6LP6HULDO1XPEHU ) HD W X U H V Figure 13. Information gain of top ranked apps using DroidBot (PermissionExcluded) ,QIR*DLQ SHUPLVVLRQ6(1'B606SHUPLVVLRQ5($'B3+21(B67$7(7HOHSKRQ\0DQDJHU!JHW'HYLFH,GFRPDQGURLGYHQGLQJ,167$//B5()(55(5SHUPLVVLRQ5(&(,9(B606DFWLRQ02817B8102817B),/(6<67(06SHUPLVVLRQ:5,7(B606SHUPLVVLRQ5($'B606SHUPLVVLRQ6<67(0B$/(57B:,1'2:DFWLRQ606B5(&(,9('7HOHSKRQ\0DQDJHU!JHW6XEVFULEHU,GDFWLRQ86(5B35(6(17SHUPLVVLRQ,167$//B3$&.$*(6SHUPLVVLRQ$&&(66B07.B00+:SHUPLVVLRQ*(7B7$6.6SHUPLVVLRQ5(&(,9(B%227B&203/(7('PHWKRGV+WWS3RVW!LQLW!SHUPLVVLRQ86(B&5('(17,$/67HOHSKRQ\0DQDJHU!JHW/LQH1XPEHUSHUPLVVLRQ$&&(66B:,),B67$7( ) HD W X U H V Figure 14. Information gain of top ranked apps using DroidBot (PermissionIncluded) ,QIR*DLQ FKURPHVHWWLQJH[WHUQDOVWRUDJHGRFXPHQWLVXLJPV\VWHPVLPXODWRUYHQGLQJJPVSURFHVVJDSSVEURZVHUSURYLGHUGHIFRQWDQDLQHUWUDLQVLPXODWRUJRRJOHLQSXWPHWKRGNDWDQDRODLQFKHUQRWLILHUXSGDWHHDVVLVW3DFNDGJH,QVWDOOHU ) HD W X U H V Figure 15. Information gain of top ranked processes ,QIR*DLQ JPFKURPH\RXWXEHNDWDQDXSGDWHHVVDVVLVW8FPRELOHYHQGLQJUDLGHUEURZVHUDSSYLJLODQGURLGDSSVWRUHIURQWJRRJOHIUHHVLPXODWRUVNHWFKERRNRUFDFODVKRFODQVSLQEDOOWUDLQVLPXODWRUZKDWVDSS ) HD W X U H V Figure 16. Information gain of top ranked connections ,QIR*DLQ SDFNDGJHBDGGHGXVHUBSUHVHQWVFUHHQBRQFRQILJXUDWLRQBFKDQJHGPHGLDBVFDQQHUBILQLVKHGPHGLDBVFDQQHUBVWDUWHGSDFNDGJHBFKDQJHGWLPHBVHWSDFNDGJHBUHSODFHGPHGLDBVFDQQHUBVFDQBILOHGDWDBFKDQJHGSDFNDGJHBUHVWDUWHGERRWBFRPSOHWHGWLPH]RQHFKDQJHGPHGLDBXQPRXQWHG ) HD W X U H V Figure 17. Information gain of top ranked intents )UHTXHQF\LQWKRXVDQGV /MDYDXWLO7LPHU7DVN!LQLW!/MDYDLR)LOH!H[LVWVVHFXULW\0HVVDJH'LJHVW!JHW,QVWDQFH7HOHSKRQ\0DQDJHU!JHW'HYLFH,G/MDYDLR)LOH2XWSXW6WUHDP!ZULWHPHWKRGV+WWS3RVW!LQLW!3DFNDJH0DQDJHU!FKHFN3HUPLVVLRQ:L 0DQDJHU!JHW&RQQHFWLRQ,QIRDFWLRQ86(5B35(6(17DFWLRQ606B5(&(,9('7HOHSKRQ\0DQDJHU!JHW6XEVFULEHU,G/DQGURLGQHW1HWZRUN,QIR!JHW6WDWHDFWLRQ3$&.$*(B$''('7HOHSKRQ\0DQDJHU!JHW/LQH1XPEHUDFWLRQ3+21(B67$7(7HOHSKRQ\0DQDJHU!JHW6LP6HULDO1XPEHUFRPDQGURLGYHQGLQJ,167$//B5()(55(5FRQWHQW&RQWH[W!ELQG6HUYLFH/DQGURLGFRQWHQW&RQWH[W!XQELQG6HUYLFH ) HD W X U H V 0DOZDUH%HQLQJQ Figure 18. Top ranked info-gain-based apps use the DroidBot (PermissionExcluded) )UHTXHQF\LQWKRXVDQGV SHUPLVVLRQ5($'B3+21(B67$7(SHUPLVVLRQ$&&(66B:,),B67$7(SHUPLVVLRQ5(&(,9(B%227B&203/(7('SHUPLVVLRQ*(7B7$6.6SHUPLVVLRQ6(1'B6067HOHSKRQ\0DQDJHU!JHW'HYLFH,GSHUPLVVLRQ6<67(0B$/(57B:,1'2:SHUPLVVLRQ5(&(,9(B606SHUPLVVLRQ5($'B606PHWKRGV+WWS3RVW!LQLW!DFWLRQ02817B8102817B),/(6<67(06SHUPLVVLRQ:5,7(B606DFWLRQ86(5B35(6(17DFWLRQ606B5(&(,9('7HOHSKRQ\0DQDJHU!JHW6XEVFULEHU,GSHUPLVVLRQ,167$//B3$&.$*(67HOHSKRQ\0DQDJHU!JHW/LQH1XPEHUSHUPLVVLRQ$&&(66B07.B00+:FRPDQGURLGYHQGLQJ,167$//B5()(55(5SHUPLVVLRQ86(B&5('(17,$/6 ) HD W X U H V 0DOZDUH%HQLQJQ Figure 19. Top ranked info-gain-based apps use the DroidBot (PermissionIncluded) )UHTXHQF\LQKXQGUHGV V\VWHPELQVHFELQV\VWHPELQVXFRQWHQWWHOHSKRQ\FDUULHUVSUHIHUDSQV\VWHP[ELQVXV\VWHPELQFKPRGV\VWHPHWFGKFSFGV\VWHPHWFULOGFIJFRQWHQWWHOHSKRQ\FDUULHUVV\VWHPELQVKDSSOLFDWLRQYQGZDSPPVPHVVDJHDSSOLFDWLRQYQGZDSVLF'(6&%&3.&63DGGLQJV\VWHPELQPRXQWGDWDORFDOWPSURRWVKHOOV\VWHPELQUPV\VWHPELQSURILOH ) HD W X U H V 0DOZDUH%HQLQJQ Figure 20. Top ranked info-gain based Process )UHTXHQF\LQKXQGUHGV $QGURLGLQWHQWDFWLRQ0$,1$QGURLGFRQWHQW&RQWH[W$QGURLGSHUPLVVLRQ,17(51(7$QGURLGLQWHQWFDWHJRU\/$81&+(5$QGURLGWHOHSKRQ\7HOHSKRQ\0DQDJHU$QGURLGLQWHQWDFWLRQ9,(:$QGURLGSHUPLVVLRQ5($'B3+21(B67$7($QGURLGSHUPLVVLRQ:5,7(B(;7(51$/B6725$*($QGURLGLQWHQWDFWLRQ%227B&203/(7('$QGURLGZHENLW:HE9LHZ$QGURLGFRQWHQW,QWHQW)LOWHU$QGURLGSHUPLVVLRQ6(1'B606$QGURLGZHENLW:HE6HWWLQJV$QGURLGSHUPLVVLRQ5(&(,9(B606$QGURLGSHUPLVVLRQ$&&(66B&2$56(B/2&$7,21$QGURLGSHUPLVVLRQ$&&(66B),1(B/2&$7,21$QGURLGSHUPLVVLRQ:$.(B/2&.$QGURLGSHUPLVVLRQ5($'B606$QGURLGLQWHQWDFWLRQ',$/$QGURLGSURYLGHU7HOHSKRQ\606B5(&(,9('$QGURLGLQWHQWDFWLRQ6&5((1B2))$QGURLGLQWHQWH[WUDVKRUWFXW1$0($QGURLGLQWHQWH[WUDVKRUWFXW,17(17$QGURLGSHUPLVVLRQ5($'B&217$&76$QGURLGSHUPLVVLRQ:5,7(B606FRP$QGURLGODXQFKHUDFWLRQ,167$//B6+257&87$QGURLGSHUPLVVLRQ&+$1*(B:,),B67$7($QGURLGLQWHQWDFWLRQ6&5((1B21$QGURLGLQWHQWH[WUDVKRUWFXW,&21B5(6285&($QGURLGLQWHQWDFWLRQ6,*B675$QGURLGLQWHQWDFWLRQ%$77(5<B&+$1*('B$&7,21FRPJRRJOHXSGDWH8SGDWH6HUYLFHFRPJRRJOHXSGDWH5HFHLYHUFRP$QGURLGSDFNDJHLQVWDOOHU$QGURLGSHUPLVVLRQ5($'B(;7(51$/B6725$*(FRPJRRJOHPDSDSN$QGURLGLQWHQWDFWLRQ1(:B287*2,1*B&$//$QGURLGLQWHQWH[WUD3+21(B180%(5$QGURLGSURYLGHU7HOHSKRQ\:$3B386+B5(&(,9('$QGURLGVHWWLQJV:,5(/(66B6(77,1*6$QGURLGSURYLGHU7HOHSKRQ\006B5(&(,9('FRP$QGURLGEURZVHUDSSOLFDWLRQBLG ) HD W X U H V 0DOZDUH%HQLQJQ Figure 22. Top ranked info-gain based Intents

The engineered features are then used to train differentalgorithms including Support Vector Machine (SVM), J48,Naive Bayes (NB), Random Forest, Recurrent Neural Network(RNN), Convolutional Neural Network (CNN), Fully Con-nected Deep Neural Network (FC-DN) and compared with ourproposed deep learning model. For ground level evaluation,we report TPR and FPR of all algorithms 24. As shown in )UHTXHQF\LQWKRXVDQGV FRPPLWFKPRGVWDUWVHUYLFHQEXIIHUOLVW)LOHVEXIIHUPNGLUPRXQWRUHPRXQWFKRZQSPLQVWDOOVWGLQFDWDGEBHQDEOHGFSUS ) HD W X U H V 0DOZDUH%HQLQJQ Figure 21. Top ranked info-gain based IntentsTable IVP

ROPOSED DEEP LEARNING MODEL WITH DIFFERENT HIDDEN LAYERSFOR DYNAMIC FEATURES ONLY

No.oflay-ers No. of Neurons TPR FPR Accuracy Runningtime(min:sec)2 200,200 0.9663 0.337 0.9449 06:312 400,400 0.9903 0.2062 0.895044 13:44

Figure 23, the proposed approach outperfomed the previousalgorithms by gaining high TPR and accuracy. However, dueto conﬁlcting relationship of features between benign andmalware, the reported FPR is not better than other approachesexcept SVM and J48.Table IV shows the performance of the deep learning modelwith a different combination of hidden layers. These presentedresults show only dynamic features using the emulator. Weapply different layers of neurons to compare the best perfor-mance of the deep learning model. Table IV applied two, threeand four layers of deep learning model, the combination of200, 200, 200 neurons achieved best compare with other layersand neurons. Similarly, we repeated the same experimentto combine the static and dynamic features shown in TableV. However, this also has the same layers and neurons.The combination of 200,200,200 in 3 layers achieved betterperformance than other layers and neurons.

Table VP

ROPOSED DEEP LEARNING MODEL WITH DIFFERENT HIDDEN LAYERSFOR S TATIC AND DYNAMIC FEATURES

No.oflay-ers No. of Neurons TPR FPR Accuracy Runningtime(min:sec)2 200,200 0.9661 0.1229 0.9332 14:512 400,400 0.972 0.0918 0.9484 30:50 Table VIT

IME COMPARISON OF DEEP LEARNING MODEL CONSTRUCTION

No. of layers No. of Neurons TPR Time3 200,200,200 Fully Connected 1403 200,200,200 RNN 1353 200,200,200 CNN 1203 200,200,200 Our Proposed (Static) 963 200,200,200 Our Proposed (Dynamic) 99Figure 23. Comparison between machine and deep learning classiﬁers

Table VI compare the time for different deep learningmodels. Experiment results indicates that proposed modelreduce the computational time and also achieve detectionperformance for the Android IoT devices.Furthermore, we focus on blockchain integration function-ality among the smart contract, hyper ledger, deep learn-ing and users. we implemented the Ethereum smart con-

Figure 24. True positive and false positive performance between differentclassiﬁers Figure 25. Identiﬁcation user results using blockchainFigure 26. Logs requesting developer upload the Android applications tract using Remix IDE. All roles have been tested to en-sure that the smart contract’s worked properly. The devel-oper uploads the apk ﬁles to the blockchain network andstores the hash in to the smart contract. In Remix, fordifferent role different address are stores such as user (0×ca35b7d915458ef540ade6d35458dfe2f44e8fa733c) and de-veloper (0 ×18965a09acff6d2a60 dcdf8bb4aff308fddc180c), totest the smart contract code. Functions are designed to approveor deny the Android apps.To utilize the IPFS storage, the smart contract transactionand gas execution are recorded as 1808246 and 1338218respectively. The transaction cost is required to upload theAndroid apps and the amount of gas is necessary to verify thehash values of the harmful apps. When uploading the apps inthe server, the execution cost of the function is USD 0.016.When an Android app is downloading USD 0.0088 is required.When verify the harmful apps the price measured USD 0.0091.Whenever the virus found to form the ﬁle then the hash valuesare stored in the blockchain. Furthermore, the execution andEither cost measured in Figure 27 and 28 respectively.The Figure 29 shows the simulation results to prove thecombination of blockchain and deep neural network increasesthe performance in terms of reducing the computational costof the neural network. It shows the training labels increase theprediction performance of the deep learning model is increase.Moreover, the blockchain network calculates the value of Figure 27. Gas consumption for downloading and uploading ﬁleFigure 28. Actual cost for downloading and uploading ﬁle nodes by using the deep neural network. And then selectsthe nodes and calculate the threshold value the features ofthe dataset taken as the input information to the blockchainnodes. The mining pools provides output to the users. Theaverage number of transaction shown in the Figure 29. Theblue label deﬁnes the real labels, and the red provides theprediction. Figure 29 (a) the correlation among the computingpower ratio and the average transaction is deﬁned as the trendof the blue dots, and red dots show an increasing pattern ofthe average transaction with the computing power ratio, whichis constant with the real world decentralized network. Figure29 (b) demonstrates the payoff increases, more transactionsthe node will have. Figure 29 (c) and (d) is the nodes arenegatively correlated

E. Comparison with other works

In this section, we evaluate the efﬁciency of the proposeddeep learning model, and we compare state-of-art deep learn-ing and machine learning approaches. As we can see in Figure23 compares the accuracy with machine learning and deeplearning classiﬁers. Moreover, we also compare our deeplearning model with previous literature shown in Table VII.Table VII hows that our method achieve higher accuracy thanother techniques.Furthermore, we compare our work with the [35] and[27] method which introduces the blockchain with Androidmalware detection. It only proposes and stores the informationof malware that is not able to the real-time deploymentof blockchain. On the other hand, compared to [35] and[27] our solution has better achievement to secure the IoTdevices. Additionally, the Table VIII shows the results of thecomparative analyses of the blockchain applications. As canbe seen in our contribution, a blockchain application is usedto identify whether benign or malware when uploading anddownloading the apps from the Internet.

Authors Algorithm Capacityforfeaturediver-sity Accuracy F-measureOURs Proposed High 96% 0.98[13] DNN/RNN medium 90% NA[64] CNN low 90% NA[30] Multi-Layer Perception low 89% 0.89[23] KMNN/ ANN/ FNN High 90 % NA[35] RNN and LSTM low 96% NA[12] DNN High 93.9 NA[65] Bayesian low 92% NA[66] SVM low NA 0.98[45] Graph Based NA 95.4% NATable VII

PERFORMANCE COMPARISIN WITH OTHER STATE OF THE ARTAPPROACHES

Primitive BlockVerify [67] SigmaLedger[68] StopTheFake[69] This Work

Blockchain Private Private Private PrivateTarget Goods Goods Picture,Video AndroidAPKFunction Detect,Identify Tag, Detect Detect,Record Detect,IdentifySmartContract ProductLabel QRCode,RFID Copyright,Catalog Hash,Feature ofAPKTable VIIIC

OMPARE WITH OTHER BLOCKCHAIN TECHNIQUES

V. C

ONCLUSION

In this paper, a new approach is presented to integratethe blockchain and multi-level deep learning model for thedetection of malware activities in a real-time environment,especially for Android IoT devices. Our proposed frameworkworks as follows: 1) Dveloper creates a malware 2) Multi-level deep learning model distributes the malware featuresinto various cluster and chooses best deep learning model for Figure 29. Correlation between the computing power ratio and average transaction each cluster. 3) Moreover, it makes decisions by analyzing theprevious data which is already stored in blockchain distributedledger and stores the new features of the malware activitiesin the blockchain 4) Finally, the blockchain smart contractprovides the notiﬁcation (of malware) to the user regardingveriﬁcation of Android app during uploading or downloadprocess. To achieve better security for IoT devices regardingmalware detection in realtime environments, millions of An-droid application features (malware and benign) were stored inthe blockchain database. Therefore, we designed a multi-layerdeep learning model for a large number of malware and benignfeatures that incept the malicious application for Android IoTdevices. The proposed model supports the multiple levelsof clustering for single data distribution. Furthermore, thesmart contract veriﬁes the malicious application to uploadingand downloading the Android apps through the network. Itcan approve or deny to uploading and downloading harmfulAndroid applications. The proposed model can identify themalware effectively which can provide more security for theAndroid IoT devices.A

CKNOWLEDGEMENT

This work is supported by the National Natural ScienceFoundation of China under grant no. U2033212 and supportedby University of Electronic Scinece and technology of ChinaProject Number: Y03019023601016201R

EFERENCES[1] M. Damshenas, A. Dehghantanha, K.-K. R. Choo, and R. Mahmud,“M0Droid: An Android Behavioral-Based Malware Detection Model,”

Journal of Information Privacy and Security , vol. 11, no. 3, pp. 141–157,2015. [2] W. Yuan, Y. Jiang, H. Li, and M. Cai, “A Lightweight On-Device De-tection Method for Android Malware,”

IEEE Transactions on Systems,Man, and Cybernetics: Systems , vol. PP, pp. 1–12, 2019.[3] M. E. Khoda, T. Imam, J. Kamruzzaman, I. Gondal, and A. Rahman,“Robust Malware Defense in Industrial IoT Applications using MachineLearning with Selective Adversarial Samples,”

IEEE Transactions onIndustry Applications , vol. PP, no. c, p. 1, 2019.[4] M. Amin, T. A. Tanveer, M. Tehseen, M. Khan, F. A. Khan, andS. Anwar, “Static malware detection and attribution in android byte-code through an end-to-end deep system,”

Future Generation ComputerSystems , vol. 102, pp. 112–126, 2020.[5] R. Kumar, Z. Xiaosong, R. U. Khan, J. Kumar, and I. Ahad, “Effectiveand explainable detection of android malware based on machine learningalgorithms,” in

Proceedings of the 2018 International Conference onComputing and Artiﬁcial Intelligence , pp. 35–40, 2018.[6] R. Kumar, X. Zhang, W. Wang, R. U. Khan, J. Kumar, and A. Sharif, “Amultimodal malware detection technique for android iot devices usingvarious features,”

IEEE Access , vol. 7, pp. 64411–64430, 2019.[7] Google Block, “Google Blocked 700,000 Malicious Apps From Playstore in 2019,https://gbhackers.com/google-blocked-700000Apps.”[8] J. Walls and K. K. R. Choo, “A review of free cloud-based anti-malwareapps for android,” in

Proceedings - 14th IEEE International Conferenceon Trust, Security and Privacy in Computing and Communications,TrustCom 2015 , vol. 1, (Helsinki Finland), pp. 1053–1058, 2015.[9] T. Lei, Z. Qin, Z. Wang, Q. Li, and D. Ye, “Evedroid: Event-awareandroid malware detection against model degrading for iot devices,”

IEEE Internet of Things Journal , vol. 6, no. 4, pp. 6668–6680, 2019.[10] M. K. Alzaylaee, S. Y. Yerima, and S. Sezer, “DL-Droid: Deep learningbased android malware detection using real devices,”

Computers andSecurity , vol. 89, 2020.[11] W. Zhong and F. Gu, “A multi-level deep learning system for malwaredetection,”

Expert Systems with Applications , vol. 133, pp. 151–162,2019.[12] Y. Zhang, Y. Sui, S. Pan, Z. Zheng, B. Ning, I. Tsang, and W. Zhou,“Familial clustering for weakly-labeled android malware using hybridrepresentation learning,”

IEEE Transactions on Information Forensicsand Security , vol. 15, pp. 3401–3414, 2019.[13] T. Kim, B. Kang, M. Rho, S. Sezer, and E. G. Im, “A multimodal deeplearning method for android malware detection using various features,” IEEE Transactions on Information Forensics and Security , vol. 14, no. 3,pp. 773–788, 2019.[14] S. Y. Yerima and S. Sezer, “Droidfusion: a novel multilevel classiﬁerfusion approach for android malware detection,”

IEEE transactions oncybernetics , vol. 49, no. 2, pp. 453–466, 2018.[15] I. Martín, J. A. Hernández, and S. de los Santos, “Machine-learningbased analysis and classiﬁcation of android malware signatures,”

FutureGeneration Computer Systems , vol. 97, pp. 295–305, 2019.[16] A. Saracino, D. Sgandurra, G. Dini, and F. Martinelli, “Madam: Effectiveand efﬁcient behavior-based android malware detection and prevention,”

IEEE Transactions on Dependable and Secure Computing , vol. 15, no. 1,pp. 83–97, 2016.[17] S. Sharmeen, S. Huda, J. H. Abawajy, W. N. Ismail, and M. M. Hassan,“Malware Threats and Detection for Industrial Mobile-IoT Networks,”

IEEE Access , vol. 6, pp. 15941–15957, 2018.[18] Businesswire, “Strategy Analytics: Android captures record 88 percentshare of global smartphone shipments in Q3 2016,” 2016.[19] A. Demontis, M. Melis, B. Biggio, D. Maiorca, D. Arp, K. Rieck,I. Corona, G. Giacinto, and F. Roli, “Yes, Machine Learning Can BeMore Secure! A Case Study on Android Malware Detection,”

IEEETransactions on Dependable and Secure Computing. , vol. 5971, no. c,2017.[20] H. Zhu, Y. Li, R. Li, J. Li, Z. You, and H. Song, “Sedmdroid: Anenhanced stacking ensemble of deep learning framework for androidmalware detection,”

IEEE Transactions on Network Science and Engi-neering , 2020.[21] J. Tang, R. Li, K. Wang, X. Gu, and Z. Xu, “A novel hybrid methodto analyze security vulnerabilities in android applications,”

TsinghuaScience and Technology , vol. 25, no. 5, pp. 589–603, 2020.[22] R. Feng, S. Chen, X. Xie, G. Meng, S. W. Lin, and Y. Liu, “Aperformance-sensitive malware detection system using deep learningon mobile devices,”

IEEE Transactions on Information Forensics andSecurity , vol. 16, pp. 1563–1578, 2021.[23] R. Taheri, M. Ghahramani, R. Javidan, M. Shojafar, Z. Pooranian, andM. Conti, “Similarity-based android malware detection using hammingdistance of static binary features,”

Future Generation Computer Systems ,vol. 105, pp. 230–247, 2020.[24] L. Cen, C. S. Gates, L. Si, and N. Li, “A Probabilistic DiscriminativeModel for Android Malware Detection with Decompiled Source Code,”

IEEE Transactions on Dependable and Secure Computing , vol. 12, no. 4,pp. 400–412, 2015.[25] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, and K. Rieck,“DREBIN: Effective and Explainable Detection of Android Malwarein Your Pocket.,”

NDSS , 2014.[26] E. M. B. Karbab, M. Debbabi, A. Derhab, and D. Mouheb, “MalDozer:Automatic framework for android malware detection using deep learn-ing,”

Digital Investigation , 2018.[27] R. Kumar, X. Zhang, R. Khan, A. Sharif, R. Kumar, X. Zhang, R. U.Khan, and A. Sharif, “Research on Data Mining of Permission-InducedRisk for Android IoT Devices,”

Applied Sciences , vol. 9, p. 277, jan2019.[28] Westyarian, Y. Rosmansyah, and B. Dabarsyah, “Malware detectionon Android smartphones using API class and machine learning,” in

Proceedings - 5th International Conference on Electrical Engineeringand Informatics: Bridging the Knowledge between Academic, Industry,and Community, ICEEI 2015 , (Denpasar Indonesia), pp. 294–297, 2015.[29] S. Wang, Z. Chen, Q. Yan, K. Ji, L. Peng, B. Yang, and M. Conti,“Deep and broad url feature mining for android malware detection,”

Information Sciences , vol. 513, pp. 600–613, 2020.[30] H. Zhu, Y. Li, R. Li, J. Li, Z.-H. You, and H. Song, “Sedmdroid: Anenhanced stacking ensemble of deep learning framework for androidmalware detection,”

IEEE Transactions on Network Science and Engi-neering , 2020.[31] B. Kang, S. Y. Yerima, S. Sezer, and K. Mclaughlin, “N-gram OpcodeAnalysis for Android Malware Detection,”

Intl. Journal on CyberSituational Awareness , vol. 1, no. 1, pp. 231–254, 2016.[32] T. Ban, T. Takahashi, S. Guo, D. Inoue, and K. Nakao, “Integrationof Multi-modal Features for Android Malware Detection Using LinearSVM,” in

Proceedings - 11th Asia Joint Conference on InformationSecurity, AsiaJCIS 2016 , pp. 141–146, 2016.[33] S. Wu, P. Wang, X. Li, and Y. Zhang, “Effective detection of androidmalware based on the usage of data ﬂow APIs and machine learning,”

Information and Software Technology , vol. 75, pp. 17–25, 2016.[34] K. Tian, D. Yao, B. G. Ryder, G. Tan, and G. Peng, “Detection ofrepackaged android malware with code-heterogeneity features,”

IEEETransactions on Dependable and Secure Computing , vol. 17, no. 1,pp. 64–77, 2020. [35] R. Feng, S. Chen, X. Xie, G. Meng, S.-W. Lin, and Y. Liu, “Aperformance-sensitive malware detection system using deep learningon mobile devices,”

IEEE Transactions on Information Forensics andSecurity , vol. 16, pp. 1563–1578, 2020.[36] S. H. Seo, A. Gupta, A. M. Sallam, E. Bertino, and K. Yim, “Detectingmobile malware threats to homeland security through static analysis,”

Journal of Network and Computer Applications , vol. 38, pp. 43–53, feb2014.[37] W. Zhang, H. Wang, H. He, and P. Liu, “Damba: Detecting androidmalware by orgb analysis,”

IEEE Transactions on Reliability , vol. 69,no. 1, pp. 55–69, 2020.[38] Y. Li, Y. Li, H. Yan, and J. Liu, “Deep joint discriminative learningfor vehicle re-identiﬁcation and retrieval,”

Proceedings - InternationalConference on Image Processing, ICIP , vol. 2017-Septe, pp. 395–399,2018.[39] S. Y. Yerima, S. Sezer, and I. Muttik, “Android malware detectionusing parallel machine learning classiﬁers,” in

Proceedings - 20148th International Conference on Next Generation Mobile Applications,Services and Technologies, NGMAST 2014 , (Oxford UK), 2014.[40] Y. Zhang, Y. Sui, S. Pan, Z. Zheng, B. Ning, I. Tsang, and W. Zhou,“Familial clustering for weakly-labeled android malware using hybridrepresentation learning,”

IEEE Transactions on Information Forensicsand Security , vol. 15, pp. 3401–3414, 2020.[41] H. Cai, N. Meng, B. Ryder, and D. Yao, “Droidcat: Effective androidmalware detection and categorization via app-level proﬁling,” vol. 14,pp. 1455–1470, 2019.[42] I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani, “Crowdroid: Behavior-Based Malware Detection System for Android,”

Proceedings of the 1stACM workshop on Security and privacy in smartphones and mobiledevices - SPSM ’11 , p. 15, 2011.[43] Y. J. Ham, D. Moon, H. W. Lee, J. D. Lim, and J. N. Kim, “Androidmobile application system call event pattern analysis for determination ofmalicious attack,”

International Journal of Security and its Applications ,2014.[44] J. Xu, Y. Li, R. Deng, and K. Xu, “Sdac: A slow-aging solution forandroid malware detection using semantic distance based api clustering,”

IEEE Transactions on Dependable and Secure Computing , pp. 1–1,2020.[45] A. Arora, S. K. Peddoju, and M. Conti, “Permpair: Android malwaredetection using permission pairs,”

IEEE Transactions on InformationForensics and Security , vol. 15, pp. 1968–1982, 2020.[46] M. Sun, X. Li, J. C. Lui, R. T. Ma, and Z. Liang, “Monet: AUser-Oriented Behavior-Based Malware Variants Detection System forAndroid,”

IEEE Transactions on Information Forensics and Security ,2017.[47] W. Li, Z. Wang, J. Cai, and S. Cheng, “An Android Malware DetectionApproach Using Weight-Adjusted Deep Learning,” in , (Maui HI USA), 2018.[48] W.-C. Wu and S.-H. Hung, “DroidDolphin: A dynamic android malwaredetection framework using big data and machine learning,” in , (Towson, Maryland), pp. 247–252, 2014.[49] V. M. Afonso, M. F. de Amorim, A. R. A. Grégio, G. B. Junquera,and P. L. de Geus, “Identifying Android malware using dynamicallyobtained features,”

Journal of Computer Virology and Hacking Tech-niques , vol. 11, no. 1, pp. 9–17, 2015.[50] R. Kumar, A. A. Khan, S. Zhang, W. Wang, Y. Abuidris, W. Amin, andJ. Kumar, “Blockchain-federated-learning and deep learning models forcovid-19 detection using ct imaging,” arXiv preprint arXiv:2007.06537 ,2020.[51] R. Kumar, W. Wang, J. Kumar, T. Yang, A. Khan, W. Ali, and I. Ali, “Anintegration of blockchain and ai for secure data sharing and detectionof ct images for the hospitals,”

Computerized Medical Imaging andGraphics , vol. 87, p. 101812, 2021.[52] A. Ferrante, M. Malek, F. Martinelli, F. Mercaldo, and J. Milosevic,“Extinguishing ransomware - a hybrid approach to android ransomwaredetection,” in

Lecture Notes in Computer Science (including subseriesLecture Notes in Artiﬁcial Intelligence and Lecture Notes in Bioinfor-matics) , (Nancy, France), 2018.[53] D. Kim, J. KIm, S. K. P. r. International, and undeﬁned 2013, “A mali-cious application detection framework using automatic feature extractiontool on android market,” in

Proc. 3rd Int. Conf. Comput. Sci. Inf. Technol.(ICCSIT) , (Bali, Indonesia), pp. 1–4, 2013.[54] S. Huda, R. Islam, J. Abawajy, J. Yearwood, M. M. Hassan, andG. Fortino, “A hybrid-multi ﬁlter-wrapper framework to identify run- time behaviour for fast malware detection,” Future Generation ComputerSystems , 2018.[55] Y. Liu, Y. Zhang, H. Li, and X. Chen, “A hybrid malware detectingscheme for mobile Android applications,” in , (Las Vegas, USA),2016.[56] J. Han, M. Kamber, and J. Pei,

Data Mining: Concepts and Techniques .2012.[57] G. Nguyen, B. M. Nguyen, D. Tran, and L. Hluchy, “A heuristicsapproach to mine behavioural data logs in mobile malware detectionsystem,”

Data and Knowledge Engineering , vol. 115, no. January,pp. 129–151, 2018.[58] H. Gonzalez, N. Stakhanova, and A. A. Ghorbani, “Droidkin:Lightweight detection of android apps similarity,” in

International Con-ference on Security and Privacy in Communication Networks

ACM Computing Surveys , 2017.[63] P. Baldi, S. Brunak, Y. Chauvin, C. A. Andersen, and H. Nielsen,“Assessing the accuracy of prediction algorithms for classiﬁcation: anoverview,”

Bioinformatics , vol. 16, no. 5, pp. 412–424, 2000.[64] N. McLaughlin, A. Doupé, G. Joon Ahn, J. Martinez del Rincon,B. Kang, S. Yerima, P. Miller, S. Sezer, Y. Safaei, E. Trickel, andZ. Zhao, “Deep Android Malware Detection,” in

Proceedings of theSeventh ACM on Conference on Data and Application Security andPrivacy - CODASPY ’17 , 2017.[65] Q. Han, V. S. Subrahmanian, and Y. Xiong, “Android malware detectionvia (somewhat) robust irreversible feature transformations,” vol. 15,pp. 3511–3525, 2020.[66] Z. Yuan, Y. Lu, and Y. Xue, “Droiddetector: Android malware char-acterization and detection using deep learning,”

Tsinghua Science andTechnology , vol. 21, no. 1, pp. 114–123, 2016.[67] BlockVerify, “BlockVerify - Blockchain Based Anti-Counterfeit Solu-tion,Introducing transparency to supply chains,,” 2019.[68] N. Alzahrani and N. Bulusu, “Block-supply chain: A new anti-counterfeiting supply chain using NFC and blockchain,” in

CRYBLOCK2018 - Proceedings of the 1st Workshop on Cryptocurrencies andBlockchains for Distributed Systems, Part of MobiSys 2018 , 2018.[69] StopTheFakes, “Blockchain Service Anti-Counterfeit & Copyright In-fringement,”