Collective Intelligence: Decentralized Learning for Android Malware Detection in IoT with Blockchain
Rajesh Kumar, WenYong Wang, Jay Kumar, Zakria, Ting Yang, Waqar Ali
11 IoTMalware: Android IoT Malware Detection basedon Deep Neural Network and BlockchainTechnology
Rajesh Kumar, WenYong Wang, Jay Kumar, Zakria, Ting Yang, Waqar Ali & Abubackar Sharif
Abstract —The Internet of Things (IoT) has been revolutioniz-ing this world by introducing exciting applications almost in allwalks of daily life, such as healthcare, smart cities, smart environ-ments, safety, remote sensing, and many more. The significance ofAndroid IoT devices is bespoke due to flexibility and hardwaresupport features. However, it is quite challenging to deal withincreasing malware attacks on android devices with every passingday. This challenge is magnified further in the case of android IoTdevices due to their limited resources such as memory and power.Therefore, this paper proposes a new framework based on theblockchain and deep learning model to provide more security forAndroid IoT devices. Moreover, our framework is capable to findthe malware activities in a real-time environment. The proposeddeep learning model analyzes various static and dynamic featuresextracted from thousands of feature of malware and benign appsthat are already stored in blockchain distributed ledger. Themulti-layer deep learning model makes decisions by analyzingthe previous data and follow some steps. Firstly, it divides themalware feature into multiple level clusters. Secondly, it choosesa unique deep learning model for each malware feature set orcluster. Finally, it produces the decision by combining the resultsgenerated from all cluster levels. Furthermore, the decisionsand multiple-level clustering data are stored in a blockchainthat can be further used to train every specialized cluster forunique data distribution. Also, a customized smart contract isdesigned to detect deceptive applications through the blockchainframework. The smart contract verifies the malicious applicationboth during the uploading and downloading process of Androidapps on the network. Consequently, the proposed frameworkprovides flexibility to features for run-time security regardingmalware detection on heterogeneous IoT devices. Finally, thesmart contract helps to approve or deny to uploading anddownloading harmful Android applications.
Index Terms —Android Malware Detection, Blockchain, Deeplearning, Secure IoT Devices, smart contract
I. I
NTRODUCTION
The future wireless technologies such as fifth-generationmobile phone networks (5G) and Internet of Things (IoT) arerevolutionizing this world by introducing innovative applica-tions to develop smart systems that can be only imagined inthe past, such as smart environment sensing, smart agricul-ture, smart drones, smart healthcare monitoring, autonomouscars, to mention a few. For developing such smart systems,heterogeneous electronic devices participate in a common
This work is supported by the National Natural Science Founda-tion of China under grant no. U2033212The authors are with theSchool of Computer Science & Engineering, University of ElectronicScience and Technology of China (e-mail: [email protected],[email protected], [email protected], [email protected], [email protected], [email protected], [email protected]. network to communicate with each other as illustrated inFigure 1. A wide range of advanced electronic IoT devicesare controlled with powerful Android platform which enablesthe integration of smart gadgets such as sensors, smartphones,smart watches, smart washing machines and many more. Suchelectronic devices including smartphones encourage people tostore and share their personal and confidential information.At the same time, it make these devices become intensivetarget for malicious application developers to harm users dueto common Android platform [1], [2], [3], [4], [5], [6]. Thecrackers/attacker can exploit the android system by indulgingfake applications that will directly affect the user’s privacy andsecurity. It may pose a severe threat snooping on user’s datasuch as confidential contracts, photos, contact information,location, account information, and passwords. Additionally,the malicious applications can produce adverse effects not onlyon the intended node but even can affect other linked deviceswith a shared network. About 0.7 million applications werereported as malicious and blocked by Google Play Store beforeuser downloading in the year of 2019 [7]. However, mostandroid app market does not provide a way to access whether amobile app is counterfeited or not. Besides, many users installapplications from anonymous sources and do not use antivirusapplication to protect from malicious and phishing attacks [8],[9]. Therefore, there exists an urgent need for an evolvedapproach and framework that can detect malware applicationstimely.Previous studies [10], [11], [12], [13], [14] proposed varioustechniques for malware detection on Android platform suchas signature mechanism [15], [16], access control [17], andsandbox [18], [19], [20]. Although, most of the techniques areefficient and effective to some extent by considering differentconstraints. Recently, deep learning techniques gained signif-icant attention to solve the malware detection problem [10],[13], [21]. Mainly, previous algorithms are highly based oninitial feature extraction process such as convolutional neuralnetwork, LSTM. However, these techniques can not be directlyapplied to smartphones and IoT devices due to their limitedresources regarding memory, processing power limitations,and so on. For this purpose, we propose a novel technique tointegrate blockchain technology with deep neural networks inorder to resolve the limitations of previous malware detectiontechniques. Our approach enables direct implication for IoTdevices.Firstly, we consider the problem of train the deep learningmodel in the decentralized network for multiple features of a r X i v : . [ c s . CR ] F e b Figure 1. Integration of IoT devices connected on common network viaAndroid application platform.
Android malware detection. In this article, the multi-layerdeep learning model inspects the malware using the trainedmodel which are stored in the distributed ledger. The overallarchitecture of the deep learning model can be concludedinto of five steps: i) Selection of the important feature usingGINI information gain function, ii) Division of a dataset intodifferent clusters to obtain the fundamental data distributionfor a particular group of malware, iii) Generation of multipleclusters as a sub-tree of each tree cluster for the huge numberof features, and iv) Choosing the best deep learning model foreach cluster set to distribute the malware and benign from thecorresponding unique data distribution. V) Finally, aggregatethe latest weights using the previous trained model history.The second problem is to track the malware or harmfulapplication when users are downloading the app through theinternet. This article use blockchain technology to store thelatest malware information which is detected by the deeplearning model. The IPFS store the Android application andhashes of the apps are stored in the blockchain ledger. Afteruploading the Android app in the IPFS, deep learning modeltest (benign, malware) the app. Then the harmful applicationinformation store in the blockchain ledger. In this way, thesmart contract helps to approve or deny the harmful Androidapplications during the uploading and downloading process.The third problem is to design a low resources consumptionmodel. In this article, we aggregate deep learning weights andutilize IPFS technique to reduce the computational cost tothe network. The integration of the deep learning model andblockchain collects the new types of malicious features of theAndroid applications from the various sources to train the deeplearning model itself. It provides security and makes betterdetection of malware for Android IoT devices in a real-timeenvironment to protect the potential vulnerabilities attacks.The main contributions of our work are listed as follows:1) This paper proposes a framework that integrates deeplearning and blockchain for better malware detectionand information sharing regarding Android applicationsacross the network. 2) The enhancement in a multi-level deep learning model isproposed that can extract multiple type of malware fea-tures and the training task is distributed in the blockchainnetwork for better prediction.3) The customized smart contract is designed to providesecure downloading and uploading of Android applica-tions, along with providing alerting the user about themalicious activities.4) An extensive empirical analysis is conducted to provethe promising results of proposed approach by providingmulti-level deep learning and secure data sharing viablockchain.The remaining part of this paper is organized as follows:Section 2 presents the literature review of Android malwaredetection, and it discusses the static, dynamic, and hybridanalysis. Section 3 discusses the proposed framework basedon blockchain and deep learning. Next, section 4 analyzes theresults and provides a comparison with other work. Finally,we concluded our work.II. L
ITERATURE R EVIEW
This section quickly review the literature of Android appli-cations malware detection and feature extraction techniques.We divide this section into four parts i) Static analysis, whichcontains two approaches the first approach is permission-based, and the second approach is API Call. ii) Dynamicanalysis that is used to extract real-time phone features, and iii)Hybrid analysis that combines the static and dynamic features.Final part provides a comparison between all techniques.
A. Static Analysis
Static analysis can check the application’s behavior withoutexecuting the app. Several machine learning techinques areproposed to classify benign apps and malicious apps [20], [22],[23] such as content based analysis that reduce the dimensionsof the content. The latest research [24], [25], [26], [27] ofstatic analysis based on the API calls and Permission featuresfor the Android malware for the malware detection are lesseffective when detecting malware [28], [29], [30], [31], [32],[33], [34], [35], [24]. However, low efficiency is demonstratedusing these methods for feature extraction. Our main focus todesign a multi-level deep learning model, which can support avarious kind of features and classify the malware and benigneffectively.
1) Permission-based analysis:
Android uses a permission-based security model to ensure that sensitive information of theuser is restricted, and the actual user only can access it. Indeed,permission is the most effective static feature because attackersapply for permission to reach their malicious goals. Before theapp gets installed, it asks for some requested permissions fromthe user. After permission granted, the app installs itself on thedevice. There are many approaches that extract permissionsfor malware detection [36], [37], [27], [38]. Wang et al. [37]proposed a methodology for analyzing the permission-basedon permission ranking, association rule and similarly based .It finds the permission groups using the correlation coefficientand ranks each permission individually . Verma et al.,[39], [22],
Figure 2. Static feature extraction processFigure 3. Top risky permission [20] used the information gain algorithm of feature selectionto choose the best features from android apk packed files. Thisapproach relies on the characteristics of entropy and selects thehighest gain value as the top features. However, most studiesused the score ranking scheme. In this study, we rank the staticand dynamic features to find out the pattern of risky featuresin the Android applications.
2) API calls:
API stands for Application Program Interface.API calls are used by the apps to interact with the Androidframework. Some works target API calls and use it as apromised feature to investigate malicious behaviour. Figure4 indicates several suspicious API calls, primarily used inmalware apps.
B. Dynamic Analysis
Moreover, dynamic analysis [40] was proposed to observeto real-time behavior of the phone to observe the dynamicbehaviors and features of applications. In this perspective, toanalysis, the dynamic behavior of malware activities, use theEmulator (Android Virtual Device) to extract the dynamic fea-tures such as API calls, Events/Action. There are several toolsto use dynamic analysis methods, including the Monkey andthe DroidBot tool. Using these tools, Dynalog is an essential
Figure 4. Suspicious API callsFigure 5. Traditional dynamic analysis features extraction and classificationprocess file used to input generation methods. It can extract the featuresof the API calls and system call information that reveals howmalware behaves, the features of the dynamic methodologycan be observed in [41], [42], [43], [44], [45] and detection ofunknown malware that shows similar behavior is also possibleusing these methods [43], [1], [46]. Also, the API call analysisand control flow are the dynamic analysis methods. [47], [40],[48], [49]. The main difference between existing works andours is that our approach combines both static and dynamicanalysis methods with the blockchain [50], [51] and deeplearning to increase the detection rate and overcome machinelearning weakness. Furthermore, our approach is to distributethe malware information in the blockchain network that cannotify benign and malware Android apps at the installationtime.The process of feature extraction for the dynamic analysisshown in Figure 5. Many techniques are used in previousliterature for dynamic method analysis, Table I shows someimportant features based on the machine learning algorithm.
C. Hybrid Analysis
The hybrid analysis combines static analysis and dynamicanalysis. The static features are extracted without executingthe application. In contrast, dynamic features are extractedby an emulator or on the real device, which is time andresource-consuming. The hybrid analysis illustrates in Figure 6to combine the static and dynamic analysis. Some researchersfocus on hybrid analysis [52], [53], [16], [54], [55], but thesemethods are time-consuming. To solve this problem, we design
Table ID
YNAMIC FEATURES DETECTION METHODS
Ref Features Accuracy MachineLearning Models[41] System call 91.75% SignatureMatching[42] System call 81% K-Means[44] System call 88.2% Frequency[43] System call - Pattern matching[33] API call 97.6 KNN_M[45] Native Size 99.9% RF, SVMFigure 6. Hybrid feature extraction process the blockchain-based framework that equally distributed re-sources to all users. Our proposed technique is more sufficientand less time consuming because of the distributed nature ofblockchain.
D. A Comparison of Static, Dynamic, and Hybrid Analysis
The comparison with respect to advantages and disadvan-tages of static and dynamic feature extraction can be seenin Table II for highlighting superiority of hybrid proposedapproach. III. SYSTEM MODELIn this article, we consider the problem of sharing thelatest malware features and train the deep learning model inthe decentralized network for multiple Android IoT devices.Due to limited resources and power consumption of Androiddevices. We design a multi-feature deep learning model whichsupport various features form the decentralized network. Thenwe are focusing the aggregate a previously trained modelwith new features sets from the latest apps which are up-loaded recently. The harmful application information storesin the blockchain network. Finally, we provide security ofthe android derives using a smart contract for retrieving theinformation of the malicious Android applications. Figure 7shows the architecture of the proposed framework. The stepsof proposed framework shown in below:1) The user uploads app to the network2) The version of the app will be stored in the InterPlane-tary File System (IPFS) .3) The deep learning model extracts the benign and mal-ware features from the uploaded app. More detail insection III-A .4) The updated deep learning model store in IPFS systemfor reducing the cost of blockchain. Additionally, we
Table IIA
COMPARISON OF STATIC , DYNAMIC , AND HYBRID ANALYSIS
Type features Advantages LimitationsStaticAnalysis
SingleCategory Easier to featureextracting Mimicry attackLow computational Code obfuscationLow accuracyMultipleCategories Easier to featureextracting Mimicry attackHigh Accuracy Hard to managemultiple featuresCode obfuscationHigh computing
DynamicAnalysis
SingleCategory Accuracy betterthen static Extraction offeatures is difficultRecover codeobfuscation High computingstill not a betterchoiceMultipleCategories Accuracy betterthen static High computationRecover codeobfuscation More resourcesneeded
HybridAnalysis
Highest accuracy High complexityBetter resultscompared to staticand dynamicanalysis More resourcesutilizationTime consumption aggregate the model weights to reduce the computationaltask in the blockchain nodes. In section III-B5) The hash value of app and decision results obtained fromdeep learning model is stored in a distributed ledger.More details shown in section III-C6) During the downloading process of the app: i) the userwill send hash value to the network ii) the smart contractwill come into action to compare and verify the hashvalue of the downloaded app iii) Finally, it will notifythe user regarding the malicious or benign app. Moredetails shown in section III-DIn summarizing Figure 8 provide a more detailed descriptionof uploading or downloading from a developer and userperspective. When a user downloads the Android apps ordevelopers uploading the apps, the smart contract uses forsecure data uploading and check the harmful features of theapps automatically. The smart contracts can track maliciousapps from the decentralized network. Also, it can help tolearn the deep learning model itself and track the malwareapplication when users are downloading the app through theInternet. Additionally, blockchain technology can create atrust-less environment and guarantees the transparency andreliability of the distributed nodes.
Figure 7. Proposed framewrok based on of deep neural network and blockchainFigure 8. Flow of user and developer of Android malware detection usingblockchain
A. Deep learning model training
In this section, we proposed a deep learning-based modelbased on static and dynamic analysis. Figure 9 shows theoverall architecture of the deep learning model for static anddynamic analysis. In the first phase, we combine static anddynamic features. The static features are extracted by decom-piling the Android apk file, and dynamic features are gath-ered from Droid Emulator(Android Virtual Environment). TheDroidBot Emulator generates the Dynalog file, and for staticfeatures, we use CSV file. In the second phase, select the staticand dynamic features from the CSV and Dynalog file and rankthe features using information gain function. The informationgain function score the features such as TelephonyManager;- >getDeviceId is 0.98. After that, we found the similarity amongthe features. In the third phase, deep learning evaluates theperformance of benign and malware applications and train theclassifier. In the fourth phase, share the features informationof malware and benign in the blockchain distributed database for achieving real-time malware detection for Android IoTdevices.
1) Feature selection for hybrid malware detection:
Weused the feature importance property of the model. Featureimportance gives a score for each feature of data betweenzero and one. The higher the score is, the more important orrelevant is the feature towards the output variable. This scorehelps in choosing the most important features and drop theleast important ones for model building. Feature importanceis an inbuilt class that comes with tree-based classifiers. Theinformation gain (IG) was used to select important featureswith a high score to classify the data effectively[56]. Theinformation gain express in the 1, 2, and 3.equation
Gain ( A ) = Inf o ( D ) − Inf o A ( D ) (1) inf o ( D ) = − postotal log postotal − negtotal log negtotal (2)InfoGainRatio ( A ) = Gain ( A ) Inf o ( D ) (3)
2) Deep learning model training:
Our main aim is tocreate a deep learning based model that ensures that Androidmalware and benign are accurately classified. Furthermore,it detects Android malware from benign apps. The previouspaper discusses various techniques for malware detection [10],[11], [57], [13], [14]. These methods are highly efficient,though; however, they can not be applied directly to mobileand the IoT devices. To improve the detection performanceof deep learning model, we design the multiple-level of deeplearning. Each deep learning model learns from specific fea-tures of Android malware data for a single group of malware.Finally, all groups of deep learning models combined andmake final predication. We tested some deep learning modelsduring the training process, which include deep recurrentneural Network, Convolutional Neural Network, Fully FeedForward Network. One of then us the best for each cluster.
Figure 9. Static and dynamic analyis deep learning model with blockchain
Additionally, LSTM avoids the batch normalization vanishinggradient problem for multiple features. We combine the RNNand LSTM to achieve better performance in distinguishing themalware and benign application. In the first step, we selectthe important feature using information gain function for thestatic and dynamic analysis. In the second stage, the datasetdived into different clusters that calculate the unique datadistribution. In the third stage, multiple clusters generated as asub-tree of each tree cluster for the huge number of features.In the fourth stage, the best deep learning classifier is selectedto distribute the malware and benign from the unique datadistribution for every cluster. However, the proposed deeplearning model classifies every cluster of each distinct featurefor the static and dynamic analysis. The use of multiple deeplearning models during the training phase reduces time andprovide better accuracy. Finally, our proposed model classifiesthe malware and benign. The use of multiple deep learningmodels during the training phase reduces time and providesbetter efficiency. Finally, our proposed model classifies themalware and benign. The workflow of all stages shown inFigure 11, and deep learning model training is shown in 10.Therefore, our model improves the detection performance andefficiency of the traditional deep learning classifier.Moreover, to save the computational power, the trainingtask is distributed in the blockchain network through theforward propagation and backward propagation. In the forwardpropagation, the input is passed through the blockchain decen-tralized network, and after processing the input, the outputis shared in the decentralized network. Then, in backwardpropagation, the weights of neural networks are shared tothe blockchain network to reduce the computational power.Therefore, the distribution of the training task reduces time and utilized decentralized resources over the network. Ad-ditionally, we simulate the training outputs in the decentral-ized network through the Proof-of-Work, Proof-of-Stake andDelegated-Proof-of-Stake. Proof-of-Work reduces the compu-tational power of the deep learning model. Delegated-Proof-of-Stake is using to vote the hash. it avoids the complex hashoperation. More precisely, state information of each node ina distributed network is taken as the dataset. The dimensionalmatrix ( M ) is the input of the deep neural network, and theaverage number of becoming the mining node in a term isthe capacity label. After training our network, we can get theaverage transaction number of the i th node as long as M i isinput. Finally, implementation of the deep neural network isaimed to learn itself from the huge volumes of data resourcesthrough the blockchain technology. The next section discussesthe blockchain technology. Figure 10. Deep learning model training steps.Figure 11. Proposed Android multi-feature deep learning model.
B. Aggregate deep learning weights from the blockchain net-work
This section aggregates the previously trained model withthe new information of the latest feature application features and updated the latest model in the IPFS to track the newharmful apps. The user uploads the app through the internet,the deep learning model identifies the malware app throughcompute the gradients and send the updated weights to theglobal blockchain network. The smart contract shares the ag-gregated updated results. Moreover, when the user downloadsthe application the smart contract identify the harmful app.The process of combining the blockchain and deep learningtechnology shown in Algorithm 1.Neural networks are trained through (i) forward propagation(ii) backward propagation are considered to calculate layers’weights. In the forward propagation, the input is passedthrough F = f ( x, w ) = ¯ y, and to processing the code x is input and w parameter vector, the trained malware andbenign features set F = ( x i , y i ); i(cid:15)I for each devices ( x i , y i ) .The output weights are shared in the decentralized networkthrough the IPFS. The loss function of the training feature setis defined as L ( F, w ) = loss . F is defined as dataset and l is the loss function. loss = F (cid:80) ( x i , y i ) ∈ F l ( y i , f ( x i , w )) Then, in backward propagation, the updated weights of neuralnetworks are using stochastic gradient descent (SGD) definedas below equation: w t +1 ← w t − η ∇ w L (cid:0) F t , w t (cid:1) (4)As we can see in equation 4 , the learning rate is η , andthe i th is the iteration of the w t parameters. F t ⊆ F is theeach devises mini-batch training dataset.The above equation is use for single user. Moreover, to learnlocal model collaboratively and create a global model from theevery devices v ∈ V shown in equation 5 w t +1 ← w t − η (cid:80) v ∈ V ∇ w L ( F tv , w t ) | V | (5) C. Storing information about malware features in blockchain
We store the Android application hashes with the maliciousand benign features (static and dynamic) in the blockchaindistributed database. The structure of the storing informationabout malware features shown in Figure 12 and further de-scribes the attributes of the Figure 12 in Table III. Furthermore,the blockchain structure divided into two parts i) Block headerand ii) Block data. In the first part of block header stores theversion number of apps, Markle root, hash values of all apps,and so on. The second part block data stores the all staticand dynamic features such as suspicious API, permission,events , calls, etc. The primary purpose to store the malwareinformation in the blockchain distributed database is to ensurethe security the identical hash values which can effectivelyprevent fraud such as de-compile and repacking Androidapplications by reverse engineering techniques. Therefore, noone can easily create counterfeit applications.Moreover, an existing system such as VirusTotal hasflaws in detecting fake Android mobile apps. The proposedblockchain framework we offer to remove these flaws andrecognize Android fake/malicious applications. Furthermore,the blockchain includes actual information in a decentralized
Algorithm 1:
Aggregate deep learning weights fromthe blockchain network MD ← MobileDevices ; { F n } n ∈ [ N ] ← Malware Features ; w ← global weights ; L ( w, x ) ← Loss ; I ← iteration ; θ ← clip bound ; for i ∈ [ I ] do for md ∈ [ M D ] do sample malware and benign feature data setwith probability | f imd | | f md | ; end for x ∈ F th do gf imd ( x ) ← ∇ w i L (cid:0) w i , x (cid:1) ; gf imd ( x ) ← gd imd ( x ) / max (cid:18) , (cid:107) gd i ( x ) (cid:107) θ (cid:19) ; retrieves the weights or global model frompermissioned blockchain ; end gf imd ← (cid:80) x ∈ D imd gd imd ( x ) + MD (cid:16) , θ ρ MD (cid:17) ; executes IPFS model to aggregation and obtainupdated the IPFS model ; add the parameters of model as a transaction ; end gf imd ← MD (cid:16)(cid:80) n ∈ [ md ] gf imd (cid:17) ; w i +1 ← w i − η · gd i ; retrieves the current updated weights from IPFS, andaggregates the weights; broadcasts new malware information to other delegatesfor verification, and collects all transactions into anew block; appends the block including the global model to thepermissioned blockchain; Table IIIB
LOCKCHAIN A TTRIBUTES
Keywords Size DefinitionPre- Hash 32 bytes preceding block hash valueVersion number 4 bytes track the protocol or software updatesTimestamp 5 bytes records the time a blockTransaction_count 15 bytes number of malware results in thecurrent blockMerkle root 32 bytes it calculate the malicious codes whichdetected by blockNonce 15 byte randomly recognized as a formalblock malware blockchain database to increase the prediction per-formance of the malware and run-time detection of malwarewhen the user downloads and upload the Android app into thenetwork.
Figure 12. Blockchain data-store technique for multi-features of Androidmalware
D. Designing a Smart Contract to secure the Android devicesto check the harmful apps
This section describes the utilization of Ethererumblockchain and smart contract for verification, tracking ofversioning history of Android apps, and further discusses thestoring mechanism of hash values in a distributed ledger. Theproposed smart contract can track different versions of appsand can provide continuous detection by broadcasting andsharing information regarding every new malicious app. Tostore an app on the blockchain network will be very expensiveand wasteful in terms of resources as most apps have relativelylarge sizes ranging from several megabytes (MB). That’s why,firstly, the uploaded app by a developer will be stored in theIPFS file system along with its version history, and furtheronly corresponding hash values of apk file will be stored inblockchain distributed ledger. Moreover, the use of the IPFSalso provides several other benefits due to its peer-to-peernetwork feature and support regarding the tracking of theversioning history of every uploaded apk files. Furthermore,our design smart contract interact during the uploading anddownloading Android applications. It handles the Androidapplication IPFS version and the hash value of the application.It can approve or deny to upload harmful Android applicationsduring uploading/downloading. Finally, the malicious featuresare broadcast using smart contracts to all users across thenetwork. Figure 4 interacts between participating entities thatare defined as developers, users, approves, deep learning, andthe smart contract.
Smart Contract :
All the interactions among the users anddevelopers are handled by the smart contract . It check the newuploaded applications and provide the information about themalware and benign. Also it store the new information aboutthe new apks. The smart contract interacts with developerand user to approve the application and notify the benign ormalware.
Methods:
Contracts are structures that define the essenceof the deal. Several contracts have requirementsthat only require a certain organization to executethem; others may be accessible to all participants.The strategies used in the smart contract aredirectly related to the effectiveness of the contract.
Modifiers:
Modifiers changes the behavior of the applicationfeatures. it can only define variables in this blockbefore execution. it can restrict the access to con-tract function according to malicious applications
Variables:
Variable holds a value and that value can changedepends on function call or conditions. Based onthe smart contract, variables can be able to storea specific data type.
Algorithm 2:
Smart contract approvers uploaded apk Contract is:
W aitF orCheckingM alwaree ; Devloper is:
ReadtT oU ploadAP ; Approva is:
W aitingT oSucessOrF ail ; if apkHashCheck(distributedLedger) then Contract is: sucessSign ; Devloper is: sucessP rovidedAP ; Approve = sucessApproval(If app is not Malware); else Contract is: denySign ; Devloper is: denyP rovidedAP ; Approve = denyApproval(If app is Malware); endAlgorithm 3: Smart contract approvers download apk apk ← DownlodedApp; apkHash ← ApplyHash(apk); Approve is:
W aitingT oSucessOrF ail ; if BlockChainLedger(apkHash) then Approve = sucessApproval(Malware informationnot found); else Approve = denyApproval; end IV. PERFORMANCE EVALUATIONIn this section, we discuss the experiment results of our pro-posed framework. It include the dataset, evaluation measures,results and comparison with other works. The proposed modelbased on deep learning algorithm and blockchain provides thestrong evidence of the results, which is obtained from theexperiments.
A. Dataset
The dataset that we used contains 18,850 normal Androidapplication packages and 10,000 malware android packageswith different features. It collected around 13,000 Androidapplication packages (. apk) as normal apps from different resources and 6971 malicious applications from known sourcessuch as DroidKin dataset [58], Android Malware GenomeProject [59] and AndroMalShare [60]. They extracted thepermissions at installation and run time after running thecollected Android application packages (. apk) using emulatorbluestack [61]. In this study, we used the new version oftheir dataset that contains 18,850 normal Android applicationpackages and 10,000 malware.
B. Experimental setup
In this paper, we extract dynamic and static features. Thedynamic analysis is done in real time devices to check the realtime performance of the network. we utilized 8 mobiles phoneswith different configurations , Android 10.0 , 6 GB RAM,Processor Kerin 980, 128 GB ROM. every smart phone processan average 400 apps daily. All phones contains sim card with4G network connection. The execution of the run time derivesare determined when chosen the input generation. Moreover,to analysis the dynamic behavior of malware activities, usethe Emulator (Android Virtual Device) to extract the dynamicfeatures such as API calls, Events/Action. After extracting thedynamic and static features, this paper combine and train themodel.
C. Evaluation Measures
We used Python; the programming language to conduct ourexperiment. In order to evaluate malware detection systemsefficiency true positive and false positive rate are used in [62],[63] TPR defined as shown in equation 6 .
T P R = T p T p + F n (6)If True Positive (TP) is the sum of correctly recognizedmalware samples and False Negative (FN) represents thenumber of wrongly detected malware samples that are benign.The recognition rate is also known as TPR. Eq 7 is definedas false positive rate (FPR). [62]: F P R = F p F p + T n (7)The malware samples incorrectly identified, and true neg-ative (TN) is the number of positive samples. FPR is alsoreferred to as the false alarm rate. Overall Accuracy (ACC):Percentage of correctly identified applications which is shwonin equation 8: Accuracy = T p + T n T p + T n + F p + F n (8) D. Results Discussion
The detailed empirical study of our proposed research workcontains two observations. Initially, our key goal is to builda deep learning based model which enable to identify anddiagnose Android malware and benign applications. For thispurpose, we analyze static and dynamic features based onoverall information gain score in Figure 13-17. The analysis ofextracted features are observed to distiguish between benign and malware applications. At first, inspected information gainbased on frequency count of features such as permissions,connections, intents of the static and dynamic features areexploited, as depicted in Figure 18-22. From the observedbehavior of mentioned features, we can conclude that API callsand services frequency ratio have inverse relationship betweenbenign and malware applications. Whereas specifically, asshown in Figure 22, frequnecy count of Intents is much less inbenign compared with malware applications. In contrast, theutilization of permission does not help to distiguish betweenbenign and malware due to same frequency count, as shownin Figure 19. Therefore, we exploit useful static and dynamicfeatures to construct a high quality model for training. , Q I R * D L Q 7 H O H S K R Q \ 0 D Q D J H U ! J H W ' H Y L F H , G F R P D Q G U R L G Y H Q G L Q J , 1 6 7 $ / / B 5 ( ) ( 5 5 ( 5 D F W L R Q 6 0 6 B 5 ( &