14
Research Article Behavior Intention Derivation of Android Malware Using Ontology Inference Jian Jiao , 1,2 Qiyuan Liu , 1,2 Xin Chen , 2 and Hongsheng Cao 1,2 1 Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing, China 2 School of Computer Science, Beijing Information Science and Technology University, Beijing, China Correspondence should be addressed to Jian Jiao; [email protected] Received 2 November 2017; Revised 26 January 2018; Accepted 20 February 2018; Published 1 April 2018 Academic Editor: Ahmad K. Malik Copyright © 2018 Jian Jiao et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Previous researches on Android malware mainly focus on malware detection, and malware’s evolution makes the process face certain hysteresis. e information presented by these detected results (malice judgment, family classification, and behavior characterization) is limited for analysts. erefore, a method is needed to restore the intention of malware, which reflects the relation between multiple behaviors of complex malware and its ultimate purpose. is paper proposes a novel description and derivation model of Android malware intention based on the theory of intention and malware reverse engineering. is approach creates ontology for malware intention to model the semantic relation between behaviors and its objects and automates the process of intention derivation by using SWRL rules transformed from intention model and Jess inference engine. Experiments on 75 typical samples show that the inference system can perform derivation of malware intention effectively, and 89.3% of the inference results are consistent with artificial analysis, which proves the feasibility and effectiveness of our theory and inference system. 1. Introduction Android malware forms a gray interest industrial chain for the purpose of tariff consumption, malicious chargeback, malicious promotion, and privacy trafficking (privacy theſt). Security companies face a large number of suspicious samples to analyze every day. e workload of manual analysis is huge, and the extraction efficiency of malicious features is low, which worsens the security situation of Android [1]. erefore, the analysis of malware samples has become a top priority of Android security. Traditional malware analyses rely on the analysts’ experience, and analysts can cope with limited number of new malicious samples, while being unable to handle it when the scale is too large. In addition, though detection tools based on malware characteristics and behavior patterns can detect some malicious behavior of malware, there are still some problems: (i) e detected results only give a preliminary judgment of malware’s malice. For example, this application may cause your privacy to be stolen or your cost loss. But they did not explain what kind of privacy was stolen and which kind of fee deduction happened. (ii) Detection of malicious behavior may not reflect the ultimate intention of malware, such as remote control. (iii) e detected information of behavior is limited and the granularity of the hint is too thick to carry out the whole malicious soſtware intention analysis, and there are also some uncertain factors such as false positive rate. erefore, the traditional methods of malware detection combined with manual analysis are facing severe challenges. It has become the bottleneck of the development of Android security to some extent. We need to transform the traditional thinking pattern of research, focus on how to present mali- cious soſtware intention to ordinary users, and allow users to obtain high readable soſtware analysis results. Behavior intention is defined as a sequence of soſtware behaviors directed at a particular purpose. e definition contains two key components: sequence of soſtware behav- iors and purpose. We think that sensitive behaviors can be extracted from a program using sensitive APIs invocation Hindawi Journal of Electrical and Computer Engineering Volume 2018, Article ID 9250297, 13 pages https://doi.org/10.1155/2018/9250297

Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

Research ArticleBehavior Intention Derivation of Android MalwareUsing Ontology Inference

Jian Jiao 12 Qiyuan Liu 12 Xin Chen 2 and Hongsheng Cao12

1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research Beijing Information Science andTechnology University Beijing China2School of Computer Science Beijing Information Science and Technology University Beijing China

Correspondence should be addressed to Jian Jiao jiaojianbistueducn

Received 2 November 2017 Revised 26 January 2018 Accepted 20 February 2018 Published 1 April 2018

Academic Editor Ahmad K Malik

Copyright copy 2018 Jian Jiao et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Previous researches on Android malware mainly focus on malware detection and malwarersquos evolution makes the process facecertain hysteresis The information presented by these detected results (malice judgment family classification and behaviorcharacterization) is limited for analysts Therefore a method is needed to restore the intention of malware which reflects therelation between multiple behaviors of complex malware and its ultimate purpose This paper proposes a novel description andderivation model of Android malware intention based on the theory of intention and malware reverse engineering This approachcreates ontology for malware intention to model the semantic relation between behaviors and its objects and automates the processof intention derivation by using SWRL rules transformed from intentionmodel and Jess inference engine Experiments on 75 typicalsamples show that the inference system can perform derivation of malware intention effectively and 893 of the inference resultsare consistent with artificial analysis which proves the feasibility and effectiveness of our theory and inference system

1 Introduction

Android malware forms a gray interest industrial chain forthe purpose of tariff consumption malicious chargebackmalicious promotion and privacy trafficking (privacy theft)Security companies face a large number of suspicious samplesto analyze every day The workload of manual analysis ishuge and the extraction efficiency of malicious features islow which worsens the security situation of Android [1]Therefore the analysis of malware samples has become a toppriority of Android security Traditional malware analysesrely on the analystsrsquo experience and analysts can copewith limited number of new malicious samples while beingunable to handle it when the scale is too large In additionthough detection tools based on malware characteristics andbehavior patterns can detect some malicious behavior ofmalware there are still some problems

(i) The detected results only give a preliminary judgmentof malwarersquos malice For example this application may causeyour privacy to be stolen or your cost loss But they did not

explain what kind of privacy was stolen and which kind of feededuction happened

(ii) Detection of malicious behavior may not reflect theultimate intention of malware such as remote control

(iii) The detected information of behavior is limited andthe granularity of the hint is too thick to carry out the wholemalicious software intention analysis and there are also someuncertain factors such as false positive rate

Therefore the traditional methods of malware detectioncombined with manual analysis are facing severe challengesIt has become the bottleneck of the development of Androidsecurity to some extent We need to transform the traditionalthinking pattern of research focus on how to present mali-cious software intention to ordinary users and allow users toobtain high readable software analysis results

Behavior intention is defined as a sequence of softwarebehaviors directed at a particular purpose The definitioncontains two key components sequence of software behav-iors and purpose We think that sensitive behaviors can beextracted from a program using sensitive APIs invocation

HindawiJournal of Electrical and Computer EngineeringVolume 2018 Article ID 9250297 13 pageshttpsdoiorg10115520189250297

2 Journal of Electrical and Computer Engineering

(security relevant API invocation usually controlled bysensitive permission) as a clue But what is the purposeHow to represent the purpose of malware to users We useone of our examples to illustrate the problem This mal-ware involves sensitive API calls such as getMessageBody()and getDeviceId() setEntity() connect() and execute() Itfirst accesses userrsquos short message and device ID numberafter encrypting them it connects to a remote server andtransmits the information to the server We use accessencrypt connect and transmit to represent these behaviorsbut the semantic of access (behavior) depends on its object(SMS DeviceId) thus we use these objects to representbehaviors After these APIs execution the state of the objectswill change we use the final states of these objects todescribe the purpose of intention Now the key componentsin the definition of intention have been illustrated Weuse this idea as motivation example to put forward ourtheory

The aims of this research are as follows firstly extract-ing security sensitive semantic layer behaviors dependingonly on the semantic information (sensitive APIs) of theprogram secondly proposing a derivation and descriptionmethod of security centric behavior chain that does not careabout unrelated software behavior thirdly the final intentionresults based on the combination of natural language andformal representation The advantage of formalization isthat automatic behavior reasoning can be implemented andnatural language description of behaviors can help usersunderstand the semantics of program The reasoning resultscan be represented by graphical method easily so that theend-users can get concise and intuitive software intentioninformation

We propose a new description and derivation model ofmalware intention using the existing reverse analysis results[2ndash4] of Android malware and the theory of intention [5 6]On this basis we complete the deduction of behavioral inten-tion at semantic level with the help of ontology reasoningtechnology [7] and realize the reverse reduction from low-level Android malicious code to high-level intention finallyThe main work of this paper is divided into two aspectsFirstly the definition and formalized model of malwareintention are given and the rationality of thismodel is provedSecondly we use ontology tomodel the semantic relationshipbetween behaviors and objects and automate the process ofintention derivation using SWRL [8] rules and Jess engine [9]SWRL is a semantic web rule language combining OWL andRuleML and also a language that presents rules in a semanticway Jess engine is the rule engine for Java platform whichcan be used to complete many complex logical reasoningtasks

The reasons for introducing ontology are as follows (1)the conceptual system based on ontology is computable andthe automatic deduction of intentions can be achieved bythe computing between concepts (2) the ontology modelof the elements and their relationships in malware behaviorintention domain can standardize domain knowledge andmake it shareable [10] (3) the extensibility of ontologyenables us to extend the conceptual model at any timeaccording to the emergence of new features of malware

The rest of the paper is organized as follows Section 2introduces related works about the research status ofmalwaredetection reverse analysis and the theory of intentionSection 3 elaborates the modeling process of the intentionSection 4 defines ontology model and SWRL rule anddescribes the establishment process of Fact Base Section 5shows the experimental results of our representative sampleof malware Finally the paper concludes with Section 6

2 Related Works

The essence of malware intention reduction is behavioranalysis and software intention is divorced from the specificAPI function call It is a behavior abstraction of semantic layerand can describe the behavior semantics intuitively Fromthe perspective of reverse engineering ApkRiskAnalyzer [11]uses commercial disassembly tool IDA Pro [12] to collect thedisassembly information It constructs taint analysis engineand constant analysis engine to detect privacy disclosurebehavior and constant use malicious behavior based oncollected information Jian and Qing [13] use the programslice of sensitive API to obtain the calling sequence ofmalicious code and then return the use case diagram of thewhole malicious code They restore the functional semanticsof malicious behavior to some extent and help users locatemalicious code quickly but they do not consider the specificrelationship between behaviors and cannot reflect the realintention of malware

Android malware research mainly focuses on the deter-mination of malicious behavior and the extraction of mali-cious features Although the semantic relations between APIsare considered in [14ndash16] and certain results have beenachieved they come to the direction of malware detec-tion eventually and cannot provide complete analysis andpresentation of malware intention AppContext [14] judgeswhether the behavior is benign or malicious according to thecontext of the sensitive API It is believed that the triggerof security sensitive behavior can be judged by contextsuch as trigger events that lead to sensitive behavior orexternal environmental conditions DroidADDMiner [15]gives the construction method of feature vectors requiredfor machine learning through the extraction of data depen-dence paths between sensitive APIs and combines constantusage and context information which can detect classifyand characterize Android malware eventually DroidSIFT[16] constructs weighted contextual API dependency graphbased on semantics and then gives the corresponding weightaccording to the risk of sensitive API After constructing thedata set of behavior graph similarity value of graph is used asthe element ofmachine learning feature vector and the featureconstruction is completed finally

The research of intention recognition is widely usedin network security defense artificial intelligence naturallanguage understanding and so on Intrusion intentionrecognition [17] is an analysis of a large amount of under-lying alarm information to explain and determine whatattackers want to achieve It is essentially a process ofimplementing a reasonable interpretation of a large numberof attack data The identification of attackerrsquos intention can

Journal of Electrical and Computer Engineering 3

determine the true intention of attackers and predict thesubsequent behavior of attackers It is the premise andfoundation of threat analysis and decision response andis an important component of network security situationalawareness Shirley and Evans [6] get the malicious judgmentof software behavior according to the matching degree ofuserrsquos operation intention and software behavior In the fieldof artificial intelligence [18] the human intention identifiedby agents is that of formal representation These studiesprovide a theoretical basis for our malware intention mod-eling

3 Model of Malware Intention

This chapter focuses on the modeling process of malwareintention First the formalized definition of software inten-tion is given according to the abstract of the problem andrelated literature Then the key component in the definitionof intention has been defined and explained Finally theinference process of intention model has been proved usingmathematical theorem

31 Formalization of Software Intention In the study of be-havioral theory Bratman Michael [5] believes that intentionshould be a basic unit of behavior research and suggests thatintention is behavior sequence based on future orientationthat is the intention will influence the behavior of the nextstep After the synthesis and analysis of a large numberof definitions of intention the connotation of intention asa sequence of actions directed at a particular purpose isobtained

To obtain the formalized definition of software intentionprogram is regarded as combination of data structure andalgorithm Then the problem can be abstracted as followsAbstracting a function call or program operation of a sen-sitive API into an action a sequence of actions that containspecific relations can perform certain functionsThey operateon a set of input data including event messages user inputprivacy data and constant argumentsThe semantics of inputdata are abstracted into an object description of actionWhentrigger condition is satisfied a malware starts its processof operations This set of data (objects) reaches a certainstate (data in final) after a series of operations eventuallywhile these final states reflect the purpose of the programrsquosexecution

Definition 1 Software intention is a sequence of softwarebehaviors directed at a particular purpose formally repre-sented as

997888rarr120596 (Intention)

= (⟨1205961 (1198751) 1205962 (1198752) 120596119899 (119875119899)⟩ Goal) (1)

In formula (1) ldquo997888rarr120596rdquo refers to directionality which in-dicates the behavior sequence ⟨1205961(1198751) 1205962(1198752) 120596119899(119875119899)⟩points to the purpose Goal120596 represents temporal and spatialattributes of behavior and temporal attributes describe spe-cific temporal relationships between actions spatial attributesindicate the impact of behavior on external objects

32 Intention Elements The elements of software inten-tion include software behavior sequence behavior objects(behavior facts) and software goals which are defined belowseparately They are the theoretical basis for the derivation ofSection 33 and the extraction method of behavior facts isdescribed in Section 43

Definition 2 Behavior object is an abstract description ofthe data or device objects that Android API invocation canoperate on It is a three-tuple

Beh object = (obj name attribute attri value) (2)

In tuple (2) obj name represents object name attributerefers to the objectrsquos property and attri value represents theproperty value Different behaviors act on different objectsand these objects correspond to different entities in theprogram

Behavior objects usually presented in the form of a classobject (system resource management network connectionobjects and network buffer objects) constants (SP numberadvertising link URL and file address) local executablefiles (system command local library (so) file) and passedvariables (parameter data path) The abstract semantics ofthese objects can be determined by certain mapping rules

Definition 3 Software behavior is an abstract software oper-ation that can operate on the object and change its propertyvalue formally represented as

Behavior = (beh name InputOutput)

Input = Beh object

Output = Beh object1015840

(3)

In formula (3) beh name represents the name of abehavior Input represents the input object descriptionsof this behavior and Output represents the output objectdescriptions of this behavior These behaviors can be dividedinto two types temporarily according to the number of theirinput in this paper B1 (single input single output) B2(double input double output)

We have discovered 102 significant behaviors involving115 sensitive APIs via sensitive API mining in GenomersquosAndroid apps and the summary of related literature [15 19ndash21] We also refer to Androidrsquos official API document and adatabase of sensitive API behavior is established based onthe information collected These APIs are controlled by 30Android sensitive permissions and have detailed descriptionsin official documents

We generate behavior facts forAPI patterns based on theirinternal program logics There are APIs whose semanticsare similar and can be classified as a class of behavior Wefurther study the extraction mechanism Table 1 presentsthe three major logics that we used (1) Sequential relationAPIs constitute a special workflow For instance connect()always happens before execute() since the first provides the

4 Journal of Electrical and Computer Engineering

Table 1 The rules of behavior semantic extraction

Program structure Location of behavior semantic(1) Sequential relation Extracting both of these APIs(2) Prepared for the latter Extracting the latter API(3)Multilevel data access Extracting the former API

second with necessary inputs In this case we study thedocument of both APIs and extract behaviors from both ofthem (2) A former object is retrieved for latter operationsFor example a SmsMessagecreateFromPdu(pdus) is alwaysinvoked prior to SmsMessagegetMessageBody() because theformer fetches the default object SmsMessage that the latterneeds We then only extract behaviors from the latter APIs(3) Multilevel data is accessed through multiple levels ofAPIs For example when accessing location data we first callgetLastKnownLocation() to return a location object and thencall getLongitude() and getLatitude() to get the longitudeand latitude from this object As these higher level APIs aremeaningful enoughwehence only extract behaviors from theformer APIs

According to the example analysis of malware samplesand the description of related literature we summarize avariety of malware behaviors and give the behavior set 119864

119864amp 120593right gain 120593access 120593store 120593encrypt 120593monitor 120593intercept

120593connect 120593transmit 120593send 120593dial 120593decode 120593install 120593popup

120593tamper 120593delete 120593hide 120593remote control

(4)

Taking the monitoring and interception of broadcastmessage as an example explains how behavior is describedBehavior can be described as a mapping relationship

(i) Broadcast message monitoring

120593monitor Broadcast info 997888rarr Broadcast info1015840

Broadcast info = (Broadcast info is monitoredNo)

Broadcast info1015840

= (Broadcast info is monitoredYes)

(5)

(ii) Broadcast message interception

120593intercept Broadcast info1015840 997888rarr Broadcast info10158401015840

Broadcast info1015840

= (Broadcast info is monitoredYes)

Broadcast info10158401015840

= (Broadcast info is interceptedYes)

(6)

Behavior is not independent and irrelevant there isalways a relation between the outputs of one behaviorcorresponding to the inputs of another behavior One ofthe most common relationships is data dependence that

is the execution input of the later behavior needs theexecution output of the previous behavior Another commonrelationship is control dependence although two behaviorshave no direct data flow relationship the execution of thelatter behaviors needs the execution of the previous one asa trigger condition In former example the relation betweenmonitor and intercept is data dependence relation and can bedescribed as follows

Output (120593monitor) 997888rarr Input (120593intercept)

Broadcast info 997888rarr Broadcast info1015840

997888rarr Broadcast info10158401015840

(7)

Definition 4 Software purpose is the final state of all inputobjects that act in an intention formally represented as

Goal = (object1 attribute attri value) times sdot sdot sdot

times (object119899 attribute attri value) (8)

In purpose representation all input objectsobject1 sdot sdot sdot object119899 represent the objects involved in theintention For example after the extraction of sensitivebehaviors in a malware and having a formal semanticrepresentation we use the ontology reasoning engine toautomate the reasoningWe can eventually get the descriptionof the relationship between behaviors and the final statesof all the objects For instance we use ldquo(SMS positionldquo1062568rdquo)rdquo to illustrate that the SMS is sent to ldquo1062568rdquo(URL is used Yes) represents the fact that the URL is usedin the related function and (DeviceId is encrypt Yes)illustrates that the DeviceId has been encrypted

33 Proof of Model If intention is viewed as a system [22]the basic component of intention is therefore the behaviorAfter the inputs and outputs involved in these behaviors aredefined a series of basic mapping relationships are obtainedwhich reflects the influence on external object of the basicbehavior The overall nature of an intention system can bedetermined by a set of mappings between a set of inputs anda set of outputs Lemmas 5 and 6 are the basic mathematicaltheory of our behavior deduction

Lemma 5 (function combination) The union of mapping set120593119894 is a mapping 120593 119860 rarr 119861

120593 =119899

⋃119894=1

120593119894

119860 =119899

⋃119894=1

119860 119894

Journal of Electrical and Computer Engineering 5

119861 =119899

⋃119894=1

119861119894

120593119894 119860 119894 rarr 119861119894 1 le 119894 le 119899(9)

when (forall119860 119894)(forall119860119895)(119860 119894 sube 119860 and 119860119895 sube 119860 and 119860 119894 = 119860119895 rarr 119860 119894 cap 119860119895 =)

Lemma 6 (function compound) Two-tuple 120593 consisted of aset of mappings Φ and its compound relation 120585∘ is a mapping120593 dom(1205931) = ran(120593119899)

120593 = (Φ 120585∘)

Φ = 120593119894 | 120593119894 119860 119894 rarr 119861119894 119860 119894 = dom (120593119894) 119861119894

= ran (120593119894) 1 le 119894 le 119899

120585∘ = 120593119894+1 ∘ 120593119894 | 120593119894 120593119894+1 isin Φ 1 le 119894 le 119899 minus 1

(10)

when (forall119894)((1 le 119894 le 119899 minus 1) rarr ran(120593119894) sube dom(120593119894+1))

According to Lemmas 5 and 6 we proved Corollaries 7and 8

Corollary 7 (compound intention deduction) 1205931 1205932 120593119899 represents a set of behaviors that are involved in an inten-tion If the relation between these behaviors is 119874119906119905119901119906119905(120593119894) =119868119899119901119906119905(120593119894+1) 1 le 119894 le 119899 minus 1 the output 119874119906119905119901119906119905(120593119899)of behavior 120593119899 is therefore the representation of final goalbehavior sequence is denoted as 120585119894 = (1205931 1205932 120593119899)

Proof 1205931 1205932 120593119899 are a set of behaviors By the definitionof 120593 behavior satisfies the mapping relation between objectsand can be regarded as function mapping 120593119894 and 120593119894+1 satisfythe former output corresponding to the latter input that isthe range of 120593119894 is equal to the domain of 120593119894+1 which satisfies(forall119894)((1 le 119894 le 119899 minus 1) rarr ran(120593119894) sube dom(120593119894+1)) According toLemma 6 Corollary is proved

Corollary 8 (combination-compound intention deduction)1205931 1205932 120593119899 represents a set of behaviors that are involvedin an intention If there is an input to output relation119874119906119905119901119906119905(120593119898) = 119868119899119901119906119905(120593119906) 1 le 119898 119906 le 119899 and parallelrelationship 119874119906119905119901119906119905(120593119894) cup 119874119906119905119901119906119905(120593119895) = 119868119899119901119906119905(120593119896) 1 le119894 119895 119896 le 119899 at the same time then the output 119874119906119905119901119906119905(120593119906) cup119874119906119905119901119906119905(120593119896) of 120593119906 and 120593119896 is the final goal Behavior sequencesare denoted as 120585119894 = (Φ 119877) Φ is behavior set 119877 is therelationship between these behaviors

Proof 1205931 1205932 120593119899 is known as a set of behaviors Bythe definition of 120593 behavior satisfies the mapping relationbetween objects and can be regarded as function mapping120593119898 and 120593119906 satisfy the formerrsquos output corresponding to thelatterrsquos input that is the range of 120593119898 is equal to the domain of120593119906 dom(120593119906) is the compound output according to Lemma 6120593119894 and 120593119895 satisfy both of their outputs as the input of 120593119896because the intersection of the inputs of 120593119894 and 120593119895 is empty(ran(120593119894) cup ran(120593119895) = ) they satisfy Lemma 5 120593119894 and 120593119895

satisfy combination relation which could be combined into anew mapping

1205931015840 ran (120593119894) cup ran (120593119895) 997888rarr dom (120593119894) cup dom (120593119895) (11)

Then 1205931015840 and 120593119896 satisfy the condition of Lemma 6 dom(120593119896)is the compound output Corollary is proved according toLemma 6

4 Ontology Inference System

This chapter elaborates the construction process of ontologyinference system Section 1 gives the construction specifica-tion of ontology model [10] Section 2 gives SWRL rules usedin inference engine Section 3 is the description of mappingmethods from Data Source to Fact Base Section 4 elaboratesthe framework of inference system

41 Ontology Model The reasons for using ontology inour inference system are as follows (1) The conceptualsystem based on ontology is computable and the automaticdeduction of intentions can be achieved by the computingbetween concepts The system has reduced the workload ofcode writing accordingly (2) The extensibility of ontologyenables us to extend the conceptual model at any timeaccording to the emergence of new features of malwareOnce new malware knowledge appears we only need tomodify the ontology model based on the knowledge Inthis way we have reduced the amount of code update andmaintenance (3) The ontology model of the elements andtheir relationships in malware behavior intention domaincan standardize domain knowledge and make it shareable[10] It can provide a standardized representation formalwareintention

The ontology model of malware behavior intention usesthe following knowledge the definition and classification ofbehavior and behavior object intention model and Corol-laries 7 and 8 The concepts and the relation are shownin Figure 1 The definitions of each concept are given inSection 3 We use Pellet engine reasoning to verify theconsistence of ontology

The definitions of object attributes and data attributes areillustrated in Table 2

42 SWRL Inference Rules This section uses the knowledgeof Corollaries 7 and 8 and Definition 4 in Section 3 Beforewriting inference rules we need to define the format of basicfacts The fact is a composition description of basic behaviorThere are two categories of basic facts (in our research) B1the behavior with single input and single output B2 thebehavior with double input and double output See Box 1

(1) Inference Rules of the Relationship between Behaviors

Premise 1 Any two B1 behaviors are B11 and B12 B11(in)represents the input of B11 and B11(out) represents its outputIf any two B1 behaviors B11 and B12 have the relation of B11rsquos

6 Journal of Electrical and Computer Engineering

Malware domain

right_gain

accessstore

encryptmonitor

intercept

connect

transmit

install_malware

popup_ad

tamper

dial

delete

hide_file

is a

is a

send

has

beh_objectbehavior

Android malware

obj_nameattribute

attri_value

hasInput

hasOutput

hasA

ttriva

lueha

sAttr

ibut

e hasNam

e

privacy

right

event

malware

parameter

config_file

decode

hasBehavior

Figure 1 Semantic ontology of behavior intention

Figure 2 SWRL inference rules of the relationship between behaviors and final goal

output object corresponding to B12rsquos object input Rule-1 isthe inference rule corresponding to this condition See Box 1

Premise 2 Any two B1 behaviors are B11 and B12 one B2behavior is B2 B11(in) represents the input to B11 andB11(out) represents its output B2(1in) B2(2in) B2(1out) andB2(2out) represent the input and output of B2 respectively

If the outputrsquos union of any two B1 behaviors is the inputof a certain B2 behavior and the input intersection of thesetwo B1 behaviors is empty then B11 and B12 have a compoundrelation with the B2 behavior respectively There is also acombination relation between B11 and B12The rule is B1-B2-Rule-1 shown in Figure 2 and the rest of the rules are similar

(2) Inference Rules of the Final GoalOn the basis of behaviorrsquosinference rules and Definition 4 the inference rules of finalgoal are summarized as in Figure 2 such as Goal-Rule-1 Theoutputs are the descriptions of objectrsquos final state Due to thelimited space other rules are no longer presented

43 Extraction of Behavior Facts The basic facts include B1and B2 behavior and the behaviorrsquos elements which shouldbe extracted from Data Source are as follows

(1) Behavior (name) extracting the behavior descriptionin a program according to the mapping between the

behaviors defined in our sensitive API database andsensitive APIs (or code segment) in program

(2) Input (behavior object) we determine objects de-scription based on the parameters of sensitive APIsand the official document definition

(3) Output (behavior object) object after a behavioroperates on it its object name and attribute willnot change while its attribute value will be affectedTherefore we can determine the changes in attributevalues based on this behaviorrsquos definition

The mapping from Data Source to behavior facts isdivided into two stages the first stage is behavior recognitionthe second includes object identification and relation analysisbetween objects Detailed process is as follows

First Stage Use reverse tool to decompile these malwaresamples and then generate its call graph (CG) and controlflow graph (CFG) The leaf nodes of the call graph aretraversed to find sensitive API and identify the correspondingbehavior according to themapping relation between behaviorand sensitive API calls Partial mapping relations betweenbehaviors and sensitive API calls and object descriptions areshown in Table 3

Second Stage For each identified behavior the behaviorrsquosobject is identified according to the usage of parameter in

Journal of Electrical and Computer Engineering 7

Table 2 Specification of object attributes and data attributes

Attribute name SpecificationhasCombinationwith Behaviors have combination relationhasCompoundwith Behaviors have compound relationhasInput(x y) The behaviorrsquos single input is ldquoyrdquohasOutput(x z) The behaviorrsquos single output is ldquozrdquohasFirstInput(x y1) The first input of double input is ldquoy1rdquohasSecondInput(x y2) The second input of double input is ldquoy2rdquohasFirstOutput(x z1) The first output of double input is ldquoz1rdquohasSecondOutput(x z2) The second output of double input is ldquoz2rdquohasBehavior(x y) The behavior belonging to malware is ldquoyrdquohasObjectName(x y) The name of the object is ldquoyrdquohasAttribute(x y) The attributes of an object are ldquoyrdquohasAttributeValue(x y) The objectrsquos attributes value is ldquoyrdquo

Rule-1 behavior B1(B11) and behavior B1(B12) and hasBehaviorName(B11 bn1)andhasBehaviorName (B12 bn2)and hasOutput(B11op) and hasInput(B12ip)and hasObjectname(op on1) and hasObjectname(ip on2)andswrlbequal(on1on2) and hasAttribute(op att1) and has Attribute(ip att2) and hasValue(att1 attri1) and hasValue(att2 attri2) and swrlbequal (attri1 attri2) and hasAttributeValue(opattv1) and hasAttributeValue (ip attv2) and hasValue(attv1 attriv1) and hasValue(attv2attriv2) and swrlbequal(attriv1attriv2)rarr hasCompoundwith(bn1 bn2)

Box 1

sensitive API (two categories the first obtains behavior objectbased on the class object and the definition of API such asgetDeviceId() the second obtains behavior object based onthe parameter usage of API and the definition of API such asRuntimeexec (RootExploitfile)) Data dependence betweenbehaviors is analyzed using FlowDroid [4]

44 Framework of Inference System The framework of inten-tion inference system is shown in Figure 3

We use the knowledge of intention definition and Corol-laries 7 and 8 to construct SWRL rules (the rules arerepresented in ontology language) Extracting Data Sourcefrom Android applications uses reverse engineering technol-ogy [15] The extraction methods of facts are illustrated inSection 43 Jess inference engine completes the reasoningprocess using the Facts Base and SWRL rules and givesthe inference output (the description of malware intention)finally

45 Motivation Example Take Zitmo [16] as an example toshow the facts extracted from these samples as shown inTable 4 Among these extracted facts access and transmit areB2 behavior B2rsquos double input indicates that the executionof the behavior needs to meet the condition of the two keyinputs

After the rules and facts are imported to Jess enginethe inference results are exported they are the formalized

descriptions of intentions of each malicious sample Theresults of Zitmorsquos behavior relations are shown in Figure 4Other results of intention reasoning are analyzed separatelyand show reasonable efficiency

We extract behavior relations and goal representationfrom Figure 4 which are shown in Table 5

Zitmorsquos intention includes behavior sequence (consistingof behavior set and behavior relation) and the representationof goal (output object description) To visually demonstratethe reasoning results we display them in Figure 5 graphi-cally In Figure 5 rectangle represents the behavior ellipserepresents the input or output object of the behavior andarrow indicates the relationship between object and behaviorThe ellipse with a black font represents the raw input ofobjects involved in the intention respectively as broadcastinformation text messages and URL addresses These finalstates are as follows broadcast message is monitored andintercepted the contents of the messages are transmitted toremote server and the URL is used as the destination addressof the transmission They represent the goal of the intentiontogether

5 Evaluation

We have implemented the ontology inference systemdescribed in Section 44 in a prototype inference system foridentifying and describing malware intention in behavior

8 Journal of Electrical and Computer Engineering

Table 3 Partial mappings between behavior and sensitive APIs

Behavior Behavior object sensitiveAPIright gain Root permission Runtimeexec()

access

Device ID getDeviceId()Carrier name getNetworkOperatorName()Phone position getCellLocation()Short Message createFromPdu()

send Number and contents sendTextMessage()intercept Broadcast information abortBroadcast()connect URL-parameter URLConnectionconnect()transmit Parameter execute()encrypt Parameter setEntity()store Parameter writeRec()

Act Obj Act

ActObj

Obj

Data source

Inference engine

SWRL rules

Corollaries7 and 8

Ontology

Inference output(intention description)

Malware

Facts base

1

2Intention

definition

Figure 3 The framework of intention inference system

Figure 4 Inference results of Zitmorsquos behavior relations

Broadcastis_monitored No monitor Broadcast

is_monitored Yes interceptBroadcast

is_ intercepted Yes

Beh_object SmsMessageposition inDevice access SmsMessage

position inMemory encrypt SmsMessageis_plain No

Input or Output URLis_used No connect

Behavior

URLis_used Yes transmit

SmsMessageposition remoteServer

Figure 5 Visualization of Zitmorsquos inference results

Journal of Electrical and Computer Engineering 9

Table 4 Behavior facts extracted from Zitmo

Behavior Code segment or function Behavior objectmonitor onReceive() Broadcastintercept abortBroadcast() Broadcastrsquo

accesscreateFrompdu()getMessageBody()

getOriginatingAddress()

BroadcastrsquoSmsMessage

connect new HttpPost(URL) URLencrypt setEntity() SmsMessage

transmit execute() URLrsquoSmsMessage

Table 5 The inference results of Zitmo

Behavior relationship Final goal representation

hasCompoundwith(monitor intercept) (Broadcast is monitored Yes)(Broadcast is intercepted Yes)

hasCompoundwith(monitor access) (SmsMessage positioninMemory)

hasCompoundwith(access encrypt) (SmsMessage is plain No)hasCombinationwith(connect encrypt) NohasCompoundwith(encrypt transmit) (URL is used Yes)

hasCompoundwith(connect transmit) (SmsMessage positionremoteServer)

facts In this section we evaluate the following aspectsFirst we verify whether the sensitive API library we buildcan cover the behavior facts that exist in malware samplesand give the coverage rate analysis Second to verifythe effectiveness of artificial extraction (semiautomated)behavior facts we compare our manual analysis resultswith the results generated by automated tool CopperDroid[19] Thirdly the effectiveness and reasoning performanceof the ontology inference system are shown Finally thecorrectness readability and effectiveness of the results ofintention reasoning are tackled

We have selected 75 typical malware samples (1 real-world ransomware sample and 9 real-world samples fromGenomersquos 1260 samples and 65 DroidBench samples) forevaluation which are shown in Table 6 Among these 10real-world samples 9 samples are collected from Androidmalware database established by Zhou and Jiang [23] inNorth Carolina State University We use these real-worldsamples because they are typical and representative of theirfamily Other samples in Genome are in same family sothe vulnerabilities and threats in the program are similarFor efficiency consideration we do not add the remainingsamples to experimental sample set DroidBench [4] samplesare designed to assess the correctness of static analyseson Android apps We use these samples as the groundtruths because they have open-sourced programs with clearsemantics [20] Thus we can make an adequate analysis ofthese samples In fact static analysis technique in generaloften lacks capability of extracting runtime behaviors and canbe evaded accordingly but it has incomparable advantagesover dynamic technology Nevertheless this paper focuses

on the research of software intention reduction that is theemphasis is to study the extraction and semantic mappingmethods of malwarersquos behavior the extraction of relationshipbetween behaviors and the representation of malwarersquos finalgoal

51 Effectiveness of Behavior Facts Extraction In generalwe have discovered 102 significant behaviors involving 115sensitive APIs via sensitive APImining inGenomersquos Androidapps and the summary of related literature [15 19ndash21] Wealso refer to Androidrsquos official API document and based onthe information collected a database of sensitiveAPI behavioris established These APIs are controlled by 30 Androidsensitive privileges and have detailed descriptions in officialdocuments

To evaluate the effectiveness and validity of behaviorfacts extracting we compare the description of our behaviorfacts extraction and CopperDroidrsquos [19] behavior descriptionCopperDroid is an automatic VMI-based dynamic analysissystem to reconstruct the behaviors of Android malware thenovelty of CopperDroid lies in its agnostic approach to iden-tify interesting OS- and high-level Android-specific behav-iors We perform a user study on CopperDroid platform andour extraction methods the goal is twofold First we give acomparative analysis on the behavior coverage rate of thesesamples Second we hope to know whether the behaviordescription generated by our methods is readable to averageaudience To this end we compare the behavior descriptionof our methods and CopperDroidrsquos Public Reports [24] Wehave collected 150 copies of CopperDroidrsquos public malwaresample analysis report and make a statistical analysis As

10 Journal of Electrical and Computer Engineering

Table6Malwares

amples

andtheirb

ehaviorinformation

Samplefam

ilyNum

ber

Ours(Cop

per)

Major

behavior

Intentioncla

ssificatio

nZitm

o1

6(2)

mon

itorinterceptaccesscon

nectencrypttransmit

Privacyste

aling

GoldD

ream

19(4)

mon

itoraccessstorecon

necttransmit

Privacyste

aling

DroidDream

15(3)

right

gainaccessconn

ecttransm

itPrivacyste

aling

DroidDelu

xe1

3(2)

mon

itorrig

htgainaccessconn

ecttransm

itPrivacyste

aling

Hippo

SMS

14(2)

send

mon

itoraccessintercept

Tariff

consum

ption

Geinimi

16(3)

remotecontrolsend

con

necttransmit

Tariff

consum

ption

RogueSPP

ush

16(2)

send

mon

itoraccessinterceptdelete

Malicious

chargeback

GGTracker

19(4)

accesscon

necttransmit

storeencryptm

onito

rsend

intercept

Malicious

chargeback

DroidKu

ngFu

-Upd

ate

14(2)

conn

ecttransm

itinsta

llmalpop

upMalwarep

ropagatio

nLo

vebu

ckleword

14(1)

popu

ptamperconn

ecttransm

itrig

htgain

Extortionuser

Aliasin

g1

2(2)

accesssend

Privacyleak

And

roidSpecific

93(2)

accesslogging

send

Privacyleak

ArraysA

ndLists

72(2)

accesssend

Privacyleak

Callb

acks

42(2)

accesssend

Privacyleak

EmulatorDetectio

n3

3(2)

accesslogging

send

Privacyleak

FieldA

ndObjectSensitivity

33(2)

accesslogging

send

Privacyleak

GeneralJava

143(2)

accesslogging

send

Privacyleak

ImplicitF

lows

42(2)

accesslogging

Privacyleak

InterA

ppCom

mun

ication

32(2)

accesssend

Privacyleak

Lifecycle

114(2)

accesscon

nectsendlogging

Privacyleak

Reflection

42(2)

accesssend

Privacyleak

Threading

23(2)

accesslogging

send

Privacyleak

Journal of Electrical and Computer Engineering 11

Table 7 Behavior semantics between ours and CopperDroid

Behavior class Objects Copper Ours Similar

Access personal infoSMS phoneaccountslocation

4 18 90

FS access Open Write 2 2 100

Network access Generic HTTPDNS 3 3 100

Send SMS SMS 1 1 100

Execute external file Generic Priv EscShell Inst APK 6 8 75

Encrypt info File info 0 2 0Intercept SMS SMS 0 2 0Store info Privacy info 0 1 0

Table 8 Intention generation results for samples

Sample set False status Missing desc Correct 119879Genome 1 1 8 10DroidBench 4 2 59 65Total 5 3 67 75

shown in Table 6 the third column (oursCopper) indicatesthe number of behaviorsrsquo extraction using our methodsand the number of behaviors extracted by CopperDroidfor the same sample In terms of the number of extractedbehaviors our extractionmechanism (87) is 776more thanCopperDroidrsquos behavior number (49)

We further study whether the behavior description gen-erated by our methods is readable to average audience Asbehavior facts extraction is the basis of this studywe thus pro-duce 75 behavior descriptions reports for these 75 samplesThe result illustrated in Table 7 depicts that the descriptionsderived from ours are richer than CopperDroid while theratio of behavior similarity ranges from75 to 100Copper-Droid failed to extract some of the behaviors we extract suchas encrypt intercept and store personal info Thus we arguethat the behaviors we extract can reflect the common pro-gram semantics and programming conventions effectively

Notice that though it may take artificial effort to generatebehavior facts description behavior extraction is a technol-ogy that has already been realized We refer to the work ofpredecessors extractingmalware behaviors though automaticextraction technology is not somatureThe focus of this workis to propose a framework for the description and deductionof behavioral intentions We argue that any analysis toolsboth static and dynamic can be utilized in our frameworkto achieve the described malware intention

52 Validity of Ontology Inference System Next we evaluatethe validity of ontology inference system Ontology inferenceexperiment reveals that the workload of manual writing codereduced significantly and Pellet engine reasoning verifiedthat the ontology is complete and consistent We evaluatethe runtime performance and ontology validity for 75 sam-ples preparation for behavior facts (add ontology instance)

dominates the runtime while the intention results generationis usually fairly fast (under 200 milliseconds) The averageinference runtime is 2 seconds while the analysis for amajority (85) of apps can be completed within 5 secondsThe validation of the reasoning results indicates that 893 ofthe samplersquos reasoning results are correct and the intentionresults of graphical display show high readability Table 9shows the readability ratings of 75 apps

53 Correctness Readability and Effectiveness

Correctness To evaluate the correctness we produce inten-tion descriptions for 75 malware sample apps (10 typical real-world samples and 65 DroidBench samples) using our behav-ior facts extraction algorithm and ontology inference engineFirst we use artificial methods to simulate automation toolsto extract the behavior facts in these samples The extractionmechanism is to construct the environment dependencegraph first and then use themethods in Section 43 tomap thesensitive APIs to behavior facts Second the extracted behav-ior facts are instantiated in the tools provided by protege347 and then the instances that are obviously wrong arefiltered Finally we use Jess inference engine for reasoning thebehavior facts and the SWRL rules are imported to inferenceengine at the same timeThe inference results give the outputof behavior sequence and final goal representation (see thedefinition of malware intention) finally the framework isshown in Figure 3

Results Table 8 presents the experimental results which showthat our inference system achieves a true positive rate of893 We must reiterate that the basis of our reasoning isthe sensitive API database we built can fully map sensitivebehaviors in malware Although it is not perfect it can be

12 Journal of Electrical and Computer Engineering

Table 9 Precision results for Appscan and intention reports

Judgment Appscan report Intention descriptiongeno Driod All geno Droid All

(1) Ineffective 5 0 5 1 2 3(2) Uncertain 1 65 66 1 4 5(3) Effective 4 0 4 8 59 67Total 10 65 75 10 65 75

fully covered in the 75 selected samples Inference systemmisses intention descriptions due to two major reasons(1) Artificial behavior facts extraction lacks accuracy Werely on our environment dependence graph and extractionalgorithm to perform simulation analysis However it is notprecise enough to handle the data flow propagation thatcannot be solved by existing static analysis technology (suchas exception handler code reflective calls) (2)The programlogic in real-world malware is more complex and withsemantic ambiguity and there are too many independentsensitive behaviors (in the form of user defined functions)This is not the result of the lack of theory and only technicalproblem

In fact static analysis in general lacks the capability ofextracting runtime behaviors and can be evaded accordinglyNevertheless we argue that our focus is to construct aframeworkwhich produces description and derivationmodelof malware intention

Readability and Effectiveness To evaluate the readability andeffectiveness of inference output (Intention) we perform auser study on our malware intention inference results Thegoal is twofold First we hope to knowwhether the generatedintentions are readable to average audience Second weexpect to see whether our intention descriptions can actuallyhelp users avoid risky apps

Methodology 360 microscopesrsquo Appscan platform [25] is anonline security scanning systemwhich focuses on the securityscan of Android apps The system will provide a scan reportafter the scanning and describe possible malicious intentof the app to users We uploaded all the samples to 360microscopesrsquo Appscan platform and got 75 security reportsof these samples We also collate ontology inference resultsand show them in the form of formalization and graphics75 samplesrsquo intention reports are produced We use both75 Appscan reports and 75 intention reports to measureuser reaction and the number of vulnerabilities providedby 360 microscopesrsquo Appscan platform is compared withthe intention description Furthermore due to the objectiveevaluation consideration we perform the user study basedon the official intents descriptions of 75 apps We use theseofficial descriptions as a reference standard

We have recruited participant directly from our lab-oratory group and we require participant that must besmartphone users We also make sure that participants haveexperiences with Android malware analysis and understandbasic smartphone events such as ldquointercept SMSrdquo or ldquoaccess

GPS locationrdquo We provide a rating for each sample report(Appscan and intention) with respect to its effectiveness Therating ranges from 1 to 3 where 1 means ineffective 2 meansuncertain and 3 means effective as shown in Table 9

Results and Implications Eventually the results are shownin Table 9 360 microscopesrsquo Appscan platform provides acoarser grained description and users can only judge the trueintents of 4 samples according to the Appscan report Whileusers can judge the malice of 67 samples using intentiondescription report This indicates our intention report isreadable even compared to Appscan report created by 360microscopesrsquo Appscan platform The results also reveal thatthe readability and granularity of intention description arerelatively important while Appscan-generated ones some-times confuse users 66 samples cannot be judged correctlythrough Appscan report In a further investigation we noticethat Appscan report provides a coarse intent descriptionof the sample while our security centered intention reportprovides a fairly fine-grained display of intention We believethat this can be further utilized during threat analysis

Threat Analysis Analysts are able to understand the relation-ships between malware behaviors (including trigger processof behaviors data transfer relation and the change in objectcharacteristics) according to the intention description Thisis of great practical significance for users to understand theimplementation mechanism of malware evaluate its possiblelosses and provide preventive solutions

6 Conclusion

In this paper a novel description and derivation model ofAndroid malware intention based on the theory of intentionand malware reverse engineering is proposed to restore theintention of malware and automate the process of intentionreasoning process In order to standardize the new concep-tual system and automate the intent derivation we createontology of malware intention As ontology is computablewe import SWRL rules represented in ontology languageand Fact Base to Jess engine and Jessrsquos outputs are thedescription of malware intention We evaluate our systemusing 75 typical malware samples (5 kinds) Experimentsshow that our inference system is capable of achieving thedesired results of intention description which proved itsfeasibility and effectiveness In addition by using existingreverse technology and data flow analysis tools we can extractbehavioral facts effectively

Journal of Electrical and Computer Engineering 13

Methods of mapping from Data Source to facts aredivided into two forms artificial extraction and automaticextraction We mainly use the methods of artificial factsextraction There are many opportunities for future workssuch as the implementation of automatic facts extractionmethods and the automatic visualization of the reasoningresults

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work was partly supported by Beijing Key Laboratory ofInternet Culture and Digital Dissemination Research Project(no ICDDXN001) National Key Technology Research andDevelopment Program of the Ministry of Science and Tech-nology of China (no 2015BAK12B03-03) Special Projectof Central Government to Guide the Development ofLocal Science and Technology (no Z171100004717002)and National Natural Science Foundation of China (nos61370065 61502040)

References

[1] P Guojun L Jingwen and S Runkang ldquoResearch and Progressof Android malware detectionrdquo Journal of Wuhan University(Science Edition) vol 61 no 1 pp 21ndash33 2015

[2] A Kharraz S Arshad C Mulliner et al ldquoA Large-ScaleAutomated Approach to Detectingrdquo in Proceedings of the 25thUSENIX Security Symposium (USENIX Security 16) pp 757ndash772 USENIX Association 2016

[3] M-Y Su and K-T Fung ldquoDetection of android malwareby static analysis on permissions and sensitive functionsrdquo inProceedings of the 8th International Conference on Ubiquitousand Future Networks ICUFN 2016 pp 873ndash875 Austria July2016

[4] S Arzt S Rasthofer C Fritz et al ldquoFlowDroid precise contextflow field object-sensitive and lifecycle-aware taint analysis forAndroid appsrdquoACM Sigplan Notices vol 49 no 6 pp 259ndash2692014

[5] E Bratman Michael ldquoWhat is intentionrdquo in Intentions incommunication P R Cohen Ed pp 15ndash32 theMITPress 1990

[6] J Shirley and D Evans ldquoThe user is not the enemy Fightingmalware by tracking user intentionsrdquo in Proceedings of the NewSecurity ParadigmsWorkshop 2008 NSPW rsquo08 pp 33ndash45 USASeptember 2008

[7] C-T Jung C-H Sun and M Yuan ldquoAn ontology-enabledframework for a geospatial problem-solving environmentrdquoComputers Environment and Urban Systems vol 38 no 1 pp45ndash57 2013

[8] I Horrocks F P Patel-Schneider H Boley et al SWRL ASemantic Web Rule Language Combining OWL and RuleMLW3C Member Submission 2004

[9] J Bak C Jedrzejek andM Falkowski ldquoUsage of the jess enginerules and ontology to query a relational databaserdquo LectureNotes in Computer Science (including subseries Lecture Notes

in Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 5858 pp 216ndash230 2009

[10] X-Y Du M Li and S Wang ldquoSurvey on ontology learningresearchrdquo Ruan Jian Xue BaoJournal of Software vol 17 no 9pp 1837ndash1847 2006

[11] S Cheng S Luo Z Li et al ldquoStatic Detection of DangerousBehaviors in Android Appsrdquo in Proceedings of the the 5thinternational symposium on cyberspace safety and securitySpringer International Publishing 2013

[12] IDA pro httpwwwhex-rayscomproductsida[13] J Jian and X Qing ldquoOn case auto generation using reverse

analysis for Android malwarerdquo Journal of HeFei University ofTechnology (Natural Science Edition) vol 39 no 4 pp 466ndash4702016

[14] W Yang X Xiao B Andow S Li T Xie and W EnckldquoAppContext differentiating malicious and benign mobile appbehaviors using contextrdquo in Proceedings of the 37th IEEEACMInternational Conference on Software Engineering (ICSE rsquo15) vol1 pp 303ndash313 IEEE May 2015

[15] Y Li T Shen X Sun et al ldquoDetection Classification andCharacterization of Android Malware Using API Data De-pendencyrdquo in Proceedings of the International Conference onSecurity and Privacy in Communication Systems vol 164 pp23ndash40 Springer International Publishing 2015

[16] M Zhang Y Duan H Yin and Z Zhao ldquoSemantics-awareAndroid malware classification using weighted contextual APIdependency graphsrdquo in Proceedings of the 21st ACM Conferenceon Computer and Communications Security (CCS rsquo14) pp 1105ndash1116 ACM Scottsdale Ariz USA November 2014

[17] P Wu H Changzhen and Y Shuping ldquoA dynamic intrusiveintention recognition method based on timed automatardquo Jour-nal of Computer Research and Development vol 48 no 7 pp1288ndash1297 2011

[18] J-H Han S-J Lee and J-H Kim ldquoBehavior Hierarchy-BasedAffordance Map for Recognition of Human Intention and ItsApplication to Human-Robot Interactionrdquo IEEE Transactionson Human-Machine Systems vol 46 no 5 pp 708ndash722 2016

[19] K Tam S J Khan A Fattori et al ldquoCopperDroid automaticreconstruction of android malware behaviorsrdquo in Proceedingsof the Network and Distributed System Security Symposium SanDiego Calif USA 2015

[20] M Zhang and H Yin ldquoAutomatic Generation of Security-Centric Descriptions for Android Appsrdquo inAndroid ApplicationSecurity Springer International Publishing 2016

[21] Z Wang C Li Z Yuan Y Guan and Y Xue ldquoDroidChain Anovel Android malware detection method based on behaviorchainsrdquo Pervasive andMobile Computing vol 32 pp 3ndash14 2016

[22] X Guozhi System science and Engineering Research Shanghaiscience and Technology Education Press 2001

[23] Y Zhou and X Jiang ldquoDissecting android malware char-acterization and evolutionrdquo in Proceedings of the 33rd IEEESymposium on Security and Privacy pp 95ndash109 IEEE 2012

[24] httpcopperdroidisgrhulacukcopperdroidindexphp[25] httpappscan360cn

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

2 Journal of Electrical and Computer Engineering

(security relevant API invocation usually controlled bysensitive permission) as a clue But what is the purposeHow to represent the purpose of malware to users We useone of our examples to illustrate the problem This mal-ware involves sensitive API calls such as getMessageBody()and getDeviceId() setEntity() connect() and execute() Itfirst accesses userrsquos short message and device ID numberafter encrypting them it connects to a remote server andtransmits the information to the server We use accessencrypt connect and transmit to represent these behaviorsbut the semantic of access (behavior) depends on its object(SMS DeviceId) thus we use these objects to representbehaviors After these APIs execution the state of the objectswill change we use the final states of these objects todescribe the purpose of intention Now the key componentsin the definition of intention have been illustrated Weuse this idea as motivation example to put forward ourtheory

The aims of this research are as follows firstly extract-ing security sensitive semantic layer behaviors dependingonly on the semantic information (sensitive APIs) of theprogram secondly proposing a derivation and descriptionmethod of security centric behavior chain that does not careabout unrelated software behavior thirdly the final intentionresults based on the combination of natural language andformal representation The advantage of formalization isthat automatic behavior reasoning can be implemented andnatural language description of behaviors can help usersunderstand the semantics of program The reasoning resultscan be represented by graphical method easily so that theend-users can get concise and intuitive software intentioninformation

We propose a new description and derivation model ofmalware intention using the existing reverse analysis results[2ndash4] of Android malware and the theory of intention [5 6]On this basis we complete the deduction of behavioral inten-tion at semantic level with the help of ontology reasoningtechnology [7] and realize the reverse reduction from low-level Android malicious code to high-level intention finallyThe main work of this paper is divided into two aspectsFirstly the definition and formalized model of malwareintention are given and the rationality of thismodel is provedSecondly we use ontology tomodel the semantic relationshipbetween behaviors and objects and automate the process ofintention derivation using SWRL [8] rules and Jess engine [9]SWRL is a semantic web rule language combining OWL andRuleML and also a language that presents rules in a semanticway Jess engine is the rule engine for Java platform whichcan be used to complete many complex logical reasoningtasks

The reasons for introducing ontology are as follows (1)the conceptual system based on ontology is computable andthe automatic deduction of intentions can be achieved bythe computing between concepts (2) the ontology modelof the elements and their relationships in malware behaviorintention domain can standardize domain knowledge andmake it shareable [10] (3) the extensibility of ontologyenables us to extend the conceptual model at any timeaccording to the emergence of new features of malware

The rest of the paper is organized as follows Section 2introduces related works about the research status ofmalwaredetection reverse analysis and the theory of intentionSection 3 elaborates the modeling process of the intentionSection 4 defines ontology model and SWRL rule anddescribes the establishment process of Fact Base Section 5shows the experimental results of our representative sampleof malware Finally the paper concludes with Section 6

2 Related Works

The essence of malware intention reduction is behavioranalysis and software intention is divorced from the specificAPI function call It is a behavior abstraction of semantic layerand can describe the behavior semantics intuitively Fromthe perspective of reverse engineering ApkRiskAnalyzer [11]uses commercial disassembly tool IDA Pro [12] to collect thedisassembly information It constructs taint analysis engineand constant analysis engine to detect privacy disclosurebehavior and constant use malicious behavior based oncollected information Jian and Qing [13] use the programslice of sensitive API to obtain the calling sequence ofmalicious code and then return the use case diagram of thewhole malicious code They restore the functional semanticsof malicious behavior to some extent and help users locatemalicious code quickly but they do not consider the specificrelationship between behaviors and cannot reflect the realintention of malware

Android malware research mainly focuses on the deter-mination of malicious behavior and the extraction of mali-cious features Although the semantic relations between APIsare considered in [14ndash16] and certain results have beenachieved they come to the direction of malware detec-tion eventually and cannot provide complete analysis andpresentation of malware intention AppContext [14] judgeswhether the behavior is benign or malicious according to thecontext of the sensitive API It is believed that the triggerof security sensitive behavior can be judged by contextsuch as trigger events that lead to sensitive behavior orexternal environmental conditions DroidADDMiner [15]gives the construction method of feature vectors requiredfor machine learning through the extraction of data depen-dence paths between sensitive APIs and combines constantusage and context information which can detect classifyand characterize Android malware eventually DroidSIFT[16] constructs weighted contextual API dependency graphbased on semantics and then gives the corresponding weightaccording to the risk of sensitive API After constructing thedata set of behavior graph similarity value of graph is used asthe element ofmachine learning feature vector and the featureconstruction is completed finally

The research of intention recognition is widely usedin network security defense artificial intelligence naturallanguage understanding and so on Intrusion intentionrecognition [17] is an analysis of a large amount of under-lying alarm information to explain and determine whatattackers want to achieve It is essentially a process ofimplementing a reasonable interpretation of a large numberof attack data The identification of attackerrsquos intention can

Journal of Electrical and Computer Engineering 3

determine the true intention of attackers and predict thesubsequent behavior of attackers It is the premise andfoundation of threat analysis and decision response andis an important component of network security situationalawareness Shirley and Evans [6] get the malicious judgmentof software behavior according to the matching degree ofuserrsquos operation intention and software behavior In the fieldof artificial intelligence [18] the human intention identifiedby agents is that of formal representation These studiesprovide a theoretical basis for our malware intention mod-eling

3 Model of Malware Intention

This chapter focuses on the modeling process of malwareintention First the formalized definition of software inten-tion is given according to the abstract of the problem andrelated literature Then the key component in the definitionof intention has been defined and explained Finally theinference process of intention model has been proved usingmathematical theorem

31 Formalization of Software Intention In the study of be-havioral theory Bratman Michael [5] believes that intentionshould be a basic unit of behavior research and suggests thatintention is behavior sequence based on future orientationthat is the intention will influence the behavior of the nextstep After the synthesis and analysis of a large numberof definitions of intention the connotation of intention asa sequence of actions directed at a particular purpose isobtained

To obtain the formalized definition of software intentionprogram is regarded as combination of data structure andalgorithm Then the problem can be abstracted as followsAbstracting a function call or program operation of a sen-sitive API into an action a sequence of actions that containspecific relations can perform certain functionsThey operateon a set of input data including event messages user inputprivacy data and constant argumentsThe semantics of inputdata are abstracted into an object description of actionWhentrigger condition is satisfied a malware starts its processof operations This set of data (objects) reaches a certainstate (data in final) after a series of operations eventuallywhile these final states reflect the purpose of the programrsquosexecution

Definition 1 Software intention is a sequence of softwarebehaviors directed at a particular purpose formally repre-sented as

997888rarr120596 (Intention)

= (⟨1205961 (1198751) 1205962 (1198752) 120596119899 (119875119899)⟩ Goal) (1)

In formula (1) ldquo997888rarr120596rdquo refers to directionality which in-dicates the behavior sequence ⟨1205961(1198751) 1205962(1198752) 120596119899(119875119899)⟩points to the purpose Goal120596 represents temporal and spatialattributes of behavior and temporal attributes describe spe-cific temporal relationships between actions spatial attributesindicate the impact of behavior on external objects

32 Intention Elements The elements of software inten-tion include software behavior sequence behavior objects(behavior facts) and software goals which are defined belowseparately They are the theoretical basis for the derivation ofSection 33 and the extraction method of behavior facts isdescribed in Section 43

Definition 2 Behavior object is an abstract description ofthe data or device objects that Android API invocation canoperate on It is a three-tuple

Beh object = (obj name attribute attri value) (2)

In tuple (2) obj name represents object name attributerefers to the objectrsquos property and attri value represents theproperty value Different behaviors act on different objectsand these objects correspond to different entities in theprogram

Behavior objects usually presented in the form of a classobject (system resource management network connectionobjects and network buffer objects) constants (SP numberadvertising link URL and file address) local executablefiles (system command local library (so) file) and passedvariables (parameter data path) The abstract semantics ofthese objects can be determined by certain mapping rules

Definition 3 Software behavior is an abstract software oper-ation that can operate on the object and change its propertyvalue formally represented as

Behavior = (beh name InputOutput)

Input = Beh object

Output = Beh object1015840

(3)

In formula (3) beh name represents the name of abehavior Input represents the input object descriptionsof this behavior and Output represents the output objectdescriptions of this behavior These behaviors can be dividedinto two types temporarily according to the number of theirinput in this paper B1 (single input single output) B2(double input double output)

We have discovered 102 significant behaviors involving115 sensitive APIs via sensitive API mining in GenomersquosAndroid apps and the summary of related literature [15 19ndash21] We also refer to Androidrsquos official API document and adatabase of sensitive API behavior is established based onthe information collected These APIs are controlled by 30Android sensitive permissions and have detailed descriptionsin official documents

We generate behavior facts forAPI patterns based on theirinternal program logics There are APIs whose semanticsare similar and can be classified as a class of behavior Wefurther study the extraction mechanism Table 1 presentsthe three major logics that we used (1) Sequential relationAPIs constitute a special workflow For instance connect()always happens before execute() since the first provides the

4 Journal of Electrical and Computer Engineering

Table 1 The rules of behavior semantic extraction

Program structure Location of behavior semantic(1) Sequential relation Extracting both of these APIs(2) Prepared for the latter Extracting the latter API(3)Multilevel data access Extracting the former API

second with necessary inputs In this case we study thedocument of both APIs and extract behaviors from both ofthem (2) A former object is retrieved for latter operationsFor example a SmsMessagecreateFromPdu(pdus) is alwaysinvoked prior to SmsMessagegetMessageBody() because theformer fetches the default object SmsMessage that the latterneeds We then only extract behaviors from the latter APIs(3) Multilevel data is accessed through multiple levels ofAPIs For example when accessing location data we first callgetLastKnownLocation() to return a location object and thencall getLongitude() and getLatitude() to get the longitudeand latitude from this object As these higher level APIs aremeaningful enoughwehence only extract behaviors from theformer APIs

According to the example analysis of malware samplesand the description of related literature we summarize avariety of malware behaviors and give the behavior set 119864

119864amp 120593right gain 120593access 120593store 120593encrypt 120593monitor 120593intercept

120593connect 120593transmit 120593send 120593dial 120593decode 120593install 120593popup

120593tamper 120593delete 120593hide 120593remote control

(4)

Taking the monitoring and interception of broadcastmessage as an example explains how behavior is describedBehavior can be described as a mapping relationship

(i) Broadcast message monitoring

120593monitor Broadcast info 997888rarr Broadcast info1015840

Broadcast info = (Broadcast info is monitoredNo)

Broadcast info1015840

= (Broadcast info is monitoredYes)

(5)

(ii) Broadcast message interception

120593intercept Broadcast info1015840 997888rarr Broadcast info10158401015840

Broadcast info1015840

= (Broadcast info is monitoredYes)

Broadcast info10158401015840

= (Broadcast info is interceptedYes)

(6)

Behavior is not independent and irrelevant there isalways a relation between the outputs of one behaviorcorresponding to the inputs of another behavior One ofthe most common relationships is data dependence that

is the execution input of the later behavior needs theexecution output of the previous behavior Another commonrelationship is control dependence although two behaviorshave no direct data flow relationship the execution of thelatter behaviors needs the execution of the previous one asa trigger condition In former example the relation betweenmonitor and intercept is data dependence relation and can bedescribed as follows

Output (120593monitor) 997888rarr Input (120593intercept)

Broadcast info 997888rarr Broadcast info1015840

997888rarr Broadcast info10158401015840

(7)

Definition 4 Software purpose is the final state of all inputobjects that act in an intention formally represented as

Goal = (object1 attribute attri value) times sdot sdot sdot

times (object119899 attribute attri value) (8)

In purpose representation all input objectsobject1 sdot sdot sdot object119899 represent the objects involved in theintention For example after the extraction of sensitivebehaviors in a malware and having a formal semanticrepresentation we use the ontology reasoning engine toautomate the reasoningWe can eventually get the descriptionof the relationship between behaviors and the final statesof all the objects For instance we use ldquo(SMS positionldquo1062568rdquo)rdquo to illustrate that the SMS is sent to ldquo1062568rdquo(URL is used Yes) represents the fact that the URL is usedin the related function and (DeviceId is encrypt Yes)illustrates that the DeviceId has been encrypted

33 Proof of Model If intention is viewed as a system [22]the basic component of intention is therefore the behaviorAfter the inputs and outputs involved in these behaviors aredefined a series of basic mapping relationships are obtainedwhich reflects the influence on external object of the basicbehavior The overall nature of an intention system can bedetermined by a set of mappings between a set of inputs anda set of outputs Lemmas 5 and 6 are the basic mathematicaltheory of our behavior deduction

Lemma 5 (function combination) The union of mapping set120593119894 is a mapping 120593 119860 rarr 119861

120593 =119899

⋃119894=1

120593119894

119860 =119899

⋃119894=1

119860 119894

Journal of Electrical and Computer Engineering 5

119861 =119899

⋃119894=1

119861119894

120593119894 119860 119894 rarr 119861119894 1 le 119894 le 119899(9)

when (forall119860 119894)(forall119860119895)(119860 119894 sube 119860 and 119860119895 sube 119860 and 119860 119894 = 119860119895 rarr 119860 119894 cap 119860119895 =)

Lemma 6 (function compound) Two-tuple 120593 consisted of aset of mappings Φ and its compound relation 120585∘ is a mapping120593 dom(1205931) = ran(120593119899)

120593 = (Φ 120585∘)

Φ = 120593119894 | 120593119894 119860 119894 rarr 119861119894 119860 119894 = dom (120593119894) 119861119894

= ran (120593119894) 1 le 119894 le 119899

120585∘ = 120593119894+1 ∘ 120593119894 | 120593119894 120593119894+1 isin Φ 1 le 119894 le 119899 minus 1

(10)

when (forall119894)((1 le 119894 le 119899 minus 1) rarr ran(120593119894) sube dom(120593119894+1))

According to Lemmas 5 and 6 we proved Corollaries 7and 8

Corollary 7 (compound intention deduction) 1205931 1205932 120593119899 represents a set of behaviors that are involved in an inten-tion If the relation between these behaviors is 119874119906119905119901119906119905(120593119894) =119868119899119901119906119905(120593119894+1) 1 le 119894 le 119899 minus 1 the output 119874119906119905119901119906119905(120593119899)of behavior 120593119899 is therefore the representation of final goalbehavior sequence is denoted as 120585119894 = (1205931 1205932 120593119899)

Proof 1205931 1205932 120593119899 are a set of behaviors By the definitionof 120593 behavior satisfies the mapping relation between objectsand can be regarded as function mapping 120593119894 and 120593119894+1 satisfythe former output corresponding to the latter input that isthe range of 120593119894 is equal to the domain of 120593119894+1 which satisfies(forall119894)((1 le 119894 le 119899 minus 1) rarr ran(120593119894) sube dom(120593119894+1)) According toLemma 6 Corollary is proved

Corollary 8 (combination-compound intention deduction)1205931 1205932 120593119899 represents a set of behaviors that are involvedin an intention If there is an input to output relation119874119906119905119901119906119905(120593119898) = 119868119899119901119906119905(120593119906) 1 le 119898 119906 le 119899 and parallelrelationship 119874119906119905119901119906119905(120593119894) cup 119874119906119905119901119906119905(120593119895) = 119868119899119901119906119905(120593119896) 1 le119894 119895 119896 le 119899 at the same time then the output 119874119906119905119901119906119905(120593119906) cup119874119906119905119901119906119905(120593119896) of 120593119906 and 120593119896 is the final goal Behavior sequencesare denoted as 120585119894 = (Φ 119877) Φ is behavior set 119877 is therelationship between these behaviors

Proof 1205931 1205932 120593119899 is known as a set of behaviors Bythe definition of 120593 behavior satisfies the mapping relationbetween objects and can be regarded as function mapping120593119898 and 120593119906 satisfy the formerrsquos output corresponding to thelatterrsquos input that is the range of 120593119898 is equal to the domain of120593119906 dom(120593119906) is the compound output according to Lemma 6120593119894 and 120593119895 satisfy both of their outputs as the input of 120593119896because the intersection of the inputs of 120593119894 and 120593119895 is empty(ran(120593119894) cup ran(120593119895) = ) they satisfy Lemma 5 120593119894 and 120593119895

satisfy combination relation which could be combined into anew mapping

1205931015840 ran (120593119894) cup ran (120593119895) 997888rarr dom (120593119894) cup dom (120593119895) (11)

Then 1205931015840 and 120593119896 satisfy the condition of Lemma 6 dom(120593119896)is the compound output Corollary is proved according toLemma 6

4 Ontology Inference System

This chapter elaborates the construction process of ontologyinference system Section 1 gives the construction specifica-tion of ontology model [10] Section 2 gives SWRL rules usedin inference engine Section 3 is the description of mappingmethods from Data Source to Fact Base Section 4 elaboratesthe framework of inference system

41 Ontology Model The reasons for using ontology inour inference system are as follows (1) The conceptualsystem based on ontology is computable and the automaticdeduction of intentions can be achieved by the computingbetween concepts The system has reduced the workload ofcode writing accordingly (2) The extensibility of ontologyenables us to extend the conceptual model at any timeaccording to the emergence of new features of malwareOnce new malware knowledge appears we only need tomodify the ontology model based on the knowledge Inthis way we have reduced the amount of code update andmaintenance (3) The ontology model of the elements andtheir relationships in malware behavior intention domaincan standardize domain knowledge and make it shareable[10] It can provide a standardized representation formalwareintention

The ontology model of malware behavior intention usesthe following knowledge the definition and classification ofbehavior and behavior object intention model and Corol-laries 7 and 8 The concepts and the relation are shownin Figure 1 The definitions of each concept are given inSection 3 We use Pellet engine reasoning to verify theconsistence of ontology

The definitions of object attributes and data attributes areillustrated in Table 2

42 SWRL Inference Rules This section uses the knowledgeof Corollaries 7 and 8 and Definition 4 in Section 3 Beforewriting inference rules we need to define the format of basicfacts The fact is a composition description of basic behaviorThere are two categories of basic facts (in our research) B1the behavior with single input and single output B2 thebehavior with double input and double output See Box 1

(1) Inference Rules of the Relationship between Behaviors

Premise 1 Any two B1 behaviors are B11 and B12 B11(in)represents the input of B11 and B11(out) represents its outputIf any two B1 behaviors B11 and B12 have the relation of B11rsquos

6 Journal of Electrical and Computer Engineering

Malware domain

right_gain

accessstore

encryptmonitor

intercept

connect

transmit

install_malware

popup_ad

tamper

dial

delete

hide_file

is a

is a

send

has

beh_objectbehavior

Android malware

obj_nameattribute

attri_value

hasInput

hasOutput

hasA

ttriva

lueha

sAttr

ibut

e hasNam

e

privacy

right

event

malware

parameter

config_file

decode

hasBehavior

Figure 1 Semantic ontology of behavior intention

Figure 2 SWRL inference rules of the relationship between behaviors and final goal

output object corresponding to B12rsquos object input Rule-1 isthe inference rule corresponding to this condition See Box 1

Premise 2 Any two B1 behaviors are B11 and B12 one B2behavior is B2 B11(in) represents the input to B11 andB11(out) represents its output B2(1in) B2(2in) B2(1out) andB2(2out) represent the input and output of B2 respectively

If the outputrsquos union of any two B1 behaviors is the inputof a certain B2 behavior and the input intersection of thesetwo B1 behaviors is empty then B11 and B12 have a compoundrelation with the B2 behavior respectively There is also acombination relation between B11 and B12The rule is B1-B2-Rule-1 shown in Figure 2 and the rest of the rules are similar

(2) Inference Rules of the Final GoalOn the basis of behaviorrsquosinference rules and Definition 4 the inference rules of finalgoal are summarized as in Figure 2 such as Goal-Rule-1 Theoutputs are the descriptions of objectrsquos final state Due to thelimited space other rules are no longer presented

43 Extraction of Behavior Facts The basic facts include B1and B2 behavior and the behaviorrsquos elements which shouldbe extracted from Data Source are as follows

(1) Behavior (name) extracting the behavior descriptionin a program according to the mapping between the

behaviors defined in our sensitive API database andsensitive APIs (or code segment) in program

(2) Input (behavior object) we determine objects de-scription based on the parameters of sensitive APIsand the official document definition

(3) Output (behavior object) object after a behavioroperates on it its object name and attribute willnot change while its attribute value will be affectedTherefore we can determine the changes in attributevalues based on this behaviorrsquos definition

The mapping from Data Source to behavior facts isdivided into two stages the first stage is behavior recognitionthe second includes object identification and relation analysisbetween objects Detailed process is as follows

First Stage Use reverse tool to decompile these malwaresamples and then generate its call graph (CG) and controlflow graph (CFG) The leaf nodes of the call graph aretraversed to find sensitive API and identify the correspondingbehavior according to themapping relation between behaviorand sensitive API calls Partial mapping relations betweenbehaviors and sensitive API calls and object descriptions areshown in Table 3

Second Stage For each identified behavior the behaviorrsquosobject is identified according to the usage of parameter in

Journal of Electrical and Computer Engineering 7

Table 2 Specification of object attributes and data attributes

Attribute name SpecificationhasCombinationwith Behaviors have combination relationhasCompoundwith Behaviors have compound relationhasInput(x y) The behaviorrsquos single input is ldquoyrdquohasOutput(x z) The behaviorrsquos single output is ldquozrdquohasFirstInput(x y1) The first input of double input is ldquoy1rdquohasSecondInput(x y2) The second input of double input is ldquoy2rdquohasFirstOutput(x z1) The first output of double input is ldquoz1rdquohasSecondOutput(x z2) The second output of double input is ldquoz2rdquohasBehavior(x y) The behavior belonging to malware is ldquoyrdquohasObjectName(x y) The name of the object is ldquoyrdquohasAttribute(x y) The attributes of an object are ldquoyrdquohasAttributeValue(x y) The objectrsquos attributes value is ldquoyrdquo

Rule-1 behavior B1(B11) and behavior B1(B12) and hasBehaviorName(B11 bn1)andhasBehaviorName (B12 bn2)and hasOutput(B11op) and hasInput(B12ip)and hasObjectname(op on1) and hasObjectname(ip on2)andswrlbequal(on1on2) and hasAttribute(op att1) and has Attribute(ip att2) and hasValue(att1 attri1) and hasValue(att2 attri2) and swrlbequal (attri1 attri2) and hasAttributeValue(opattv1) and hasAttributeValue (ip attv2) and hasValue(attv1 attriv1) and hasValue(attv2attriv2) and swrlbequal(attriv1attriv2)rarr hasCompoundwith(bn1 bn2)

Box 1

sensitive API (two categories the first obtains behavior objectbased on the class object and the definition of API such asgetDeviceId() the second obtains behavior object based onthe parameter usage of API and the definition of API such asRuntimeexec (RootExploitfile)) Data dependence betweenbehaviors is analyzed using FlowDroid [4]

44 Framework of Inference System The framework of inten-tion inference system is shown in Figure 3

We use the knowledge of intention definition and Corol-laries 7 and 8 to construct SWRL rules (the rules arerepresented in ontology language) Extracting Data Sourcefrom Android applications uses reverse engineering technol-ogy [15] The extraction methods of facts are illustrated inSection 43 Jess inference engine completes the reasoningprocess using the Facts Base and SWRL rules and givesthe inference output (the description of malware intention)finally

45 Motivation Example Take Zitmo [16] as an example toshow the facts extracted from these samples as shown inTable 4 Among these extracted facts access and transmit areB2 behavior B2rsquos double input indicates that the executionof the behavior needs to meet the condition of the two keyinputs

After the rules and facts are imported to Jess enginethe inference results are exported they are the formalized

descriptions of intentions of each malicious sample Theresults of Zitmorsquos behavior relations are shown in Figure 4Other results of intention reasoning are analyzed separatelyand show reasonable efficiency

We extract behavior relations and goal representationfrom Figure 4 which are shown in Table 5

Zitmorsquos intention includes behavior sequence (consistingof behavior set and behavior relation) and the representationof goal (output object description) To visually demonstratethe reasoning results we display them in Figure 5 graphi-cally In Figure 5 rectangle represents the behavior ellipserepresents the input or output object of the behavior andarrow indicates the relationship between object and behaviorThe ellipse with a black font represents the raw input ofobjects involved in the intention respectively as broadcastinformation text messages and URL addresses These finalstates are as follows broadcast message is monitored andintercepted the contents of the messages are transmitted toremote server and the URL is used as the destination addressof the transmission They represent the goal of the intentiontogether

5 Evaluation

We have implemented the ontology inference systemdescribed in Section 44 in a prototype inference system foridentifying and describing malware intention in behavior

8 Journal of Electrical and Computer Engineering

Table 3 Partial mappings between behavior and sensitive APIs

Behavior Behavior object sensitiveAPIright gain Root permission Runtimeexec()

access

Device ID getDeviceId()Carrier name getNetworkOperatorName()Phone position getCellLocation()Short Message createFromPdu()

send Number and contents sendTextMessage()intercept Broadcast information abortBroadcast()connect URL-parameter URLConnectionconnect()transmit Parameter execute()encrypt Parameter setEntity()store Parameter writeRec()

Act Obj Act

ActObj

Obj

Data source

Inference engine

SWRL rules

Corollaries7 and 8

Ontology

Inference output(intention description)

Malware

Facts base

1

2Intention

definition

Figure 3 The framework of intention inference system

Figure 4 Inference results of Zitmorsquos behavior relations

Broadcastis_monitored No monitor Broadcast

is_monitored Yes interceptBroadcast

is_ intercepted Yes

Beh_object SmsMessageposition inDevice access SmsMessage

position inMemory encrypt SmsMessageis_plain No

Input or Output URLis_used No connect

Behavior

URLis_used Yes transmit

SmsMessageposition remoteServer

Figure 5 Visualization of Zitmorsquos inference results

Journal of Electrical and Computer Engineering 9

Table 4 Behavior facts extracted from Zitmo

Behavior Code segment or function Behavior objectmonitor onReceive() Broadcastintercept abortBroadcast() Broadcastrsquo

accesscreateFrompdu()getMessageBody()

getOriginatingAddress()

BroadcastrsquoSmsMessage

connect new HttpPost(URL) URLencrypt setEntity() SmsMessage

transmit execute() URLrsquoSmsMessage

Table 5 The inference results of Zitmo

Behavior relationship Final goal representation

hasCompoundwith(monitor intercept) (Broadcast is monitored Yes)(Broadcast is intercepted Yes)

hasCompoundwith(monitor access) (SmsMessage positioninMemory)

hasCompoundwith(access encrypt) (SmsMessage is plain No)hasCombinationwith(connect encrypt) NohasCompoundwith(encrypt transmit) (URL is used Yes)

hasCompoundwith(connect transmit) (SmsMessage positionremoteServer)

facts In this section we evaluate the following aspectsFirst we verify whether the sensitive API library we buildcan cover the behavior facts that exist in malware samplesand give the coverage rate analysis Second to verifythe effectiveness of artificial extraction (semiautomated)behavior facts we compare our manual analysis resultswith the results generated by automated tool CopperDroid[19] Thirdly the effectiveness and reasoning performanceof the ontology inference system are shown Finally thecorrectness readability and effectiveness of the results ofintention reasoning are tackled

We have selected 75 typical malware samples (1 real-world ransomware sample and 9 real-world samples fromGenomersquos 1260 samples and 65 DroidBench samples) forevaluation which are shown in Table 6 Among these 10real-world samples 9 samples are collected from Androidmalware database established by Zhou and Jiang [23] inNorth Carolina State University We use these real-worldsamples because they are typical and representative of theirfamily Other samples in Genome are in same family sothe vulnerabilities and threats in the program are similarFor efficiency consideration we do not add the remainingsamples to experimental sample set DroidBench [4] samplesare designed to assess the correctness of static analyseson Android apps We use these samples as the groundtruths because they have open-sourced programs with clearsemantics [20] Thus we can make an adequate analysis ofthese samples In fact static analysis technique in generaloften lacks capability of extracting runtime behaviors and canbe evaded accordingly but it has incomparable advantagesover dynamic technology Nevertheless this paper focuses

on the research of software intention reduction that is theemphasis is to study the extraction and semantic mappingmethods of malwarersquos behavior the extraction of relationshipbetween behaviors and the representation of malwarersquos finalgoal

51 Effectiveness of Behavior Facts Extraction In generalwe have discovered 102 significant behaviors involving 115sensitive APIs via sensitive APImining inGenomersquos Androidapps and the summary of related literature [15 19ndash21] Wealso refer to Androidrsquos official API document and based onthe information collected a database of sensitiveAPI behavioris established These APIs are controlled by 30 Androidsensitive privileges and have detailed descriptions in officialdocuments

To evaluate the effectiveness and validity of behaviorfacts extracting we compare the description of our behaviorfacts extraction and CopperDroidrsquos [19] behavior descriptionCopperDroid is an automatic VMI-based dynamic analysissystem to reconstruct the behaviors of Android malware thenovelty of CopperDroid lies in its agnostic approach to iden-tify interesting OS- and high-level Android-specific behav-iors We perform a user study on CopperDroid platform andour extraction methods the goal is twofold First we give acomparative analysis on the behavior coverage rate of thesesamples Second we hope to know whether the behaviordescription generated by our methods is readable to averageaudience To this end we compare the behavior descriptionof our methods and CopperDroidrsquos Public Reports [24] Wehave collected 150 copies of CopperDroidrsquos public malwaresample analysis report and make a statistical analysis As

10 Journal of Electrical and Computer Engineering

Table6Malwares

amples

andtheirb

ehaviorinformation

Samplefam

ilyNum

ber

Ours(Cop

per)

Major

behavior

Intentioncla

ssificatio

nZitm

o1

6(2)

mon

itorinterceptaccesscon

nectencrypttransmit

Privacyste

aling

GoldD

ream

19(4)

mon

itoraccessstorecon

necttransmit

Privacyste

aling

DroidDream

15(3)

right

gainaccessconn

ecttransm

itPrivacyste

aling

DroidDelu

xe1

3(2)

mon

itorrig

htgainaccessconn

ecttransm

itPrivacyste

aling

Hippo

SMS

14(2)

send

mon

itoraccessintercept

Tariff

consum

ption

Geinimi

16(3)

remotecontrolsend

con

necttransmit

Tariff

consum

ption

RogueSPP

ush

16(2)

send

mon

itoraccessinterceptdelete

Malicious

chargeback

GGTracker

19(4)

accesscon

necttransmit

storeencryptm

onito

rsend

intercept

Malicious

chargeback

DroidKu

ngFu

-Upd

ate

14(2)

conn

ecttransm

itinsta

llmalpop

upMalwarep

ropagatio

nLo

vebu

ckleword

14(1)

popu

ptamperconn

ecttransm

itrig

htgain

Extortionuser

Aliasin

g1

2(2)

accesssend

Privacyleak

And

roidSpecific

93(2)

accesslogging

send

Privacyleak

ArraysA

ndLists

72(2)

accesssend

Privacyleak

Callb

acks

42(2)

accesssend

Privacyleak

EmulatorDetectio

n3

3(2)

accesslogging

send

Privacyleak

FieldA

ndObjectSensitivity

33(2)

accesslogging

send

Privacyleak

GeneralJava

143(2)

accesslogging

send

Privacyleak

ImplicitF

lows

42(2)

accesslogging

Privacyleak

InterA

ppCom

mun

ication

32(2)

accesssend

Privacyleak

Lifecycle

114(2)

accesscon

nectsendlogging

Privacyleak

Reflection

42(2)

accesssend

Privacyleak

Threading

23(2)

accesslogging

send

Privacyleak

Journal of Electrical and Computer Engineering 11

Table 7 Behavior semantics between ours and CopperDroid

Behavior class Objects Copper Ours Similar

Access personal infoSMS phoneaccountslocation

4 18 90

FS access Open Write 2 2 100

Network access Generic HTTPDNS 3 3 100

Send SMS SMS 1 1 100

Execute external file Generic Priv EscShell Inst APK 6 8 75

Encrypt info File info 0 2 0Intercept SMS SMS 0 2 0Store info Privacy info 0 1 0

Table 8 Intention generation results for samples

Sample set False status Missing desc Correct 119879Genome 1 1 8 10DroidBench 4 2 59 65Total 5 3 67 75

shown in Table 6 the third column (oursCopper) indicatesthe number of behaviorsrsquo extraction using our methodsand the number of behaviors extracted by CopperDroidfor the same sample In terms of the number of extractedbehaviors our extractionmechanism (87) is 776more thanCopperDroidrsquos behavior number (49)

We further study whether the behavior description gen-erated by our methods is readable to average audience Asbehavior facts extraction is the basis of this studywe thus pro-duce 75 behavior descriptions reports for these 75 samplesThe result illustrated in Table 7 depicts that the descriptionsderived from ours are richer than CopperDroid while theratio of behavior similarity ranges from75 to 100Copper-Droid failed to extract some of the behaviors we extract suchas encrypt intercept and store personal info Thus we arguethat the behaviors we extract can reflect the common pro-gram semantics and programming conventions effectively

Notice that though it may take artificial effort to generatebehavior facts description behavior extraction is a technol-ogy that has already been realized We refer to the work ofpredecessors extractingmalware behaviors though automaticextraction technology is not somatureThe focus of this workis to propose a framework for the description and deductionof behavioral intentions We argue that any analysis toolsboth static and dynamic can be utilized in our frameworkto achieve the described malware intention

52 Validity of Ontology Inference System Next we evaluatethe validity of ontology inference system Ontology inferenceexperiment reveals that the workload of manual writing codereduced significantly and Pellet engine reasoning verifiedthat the ontology is complete and consistent We evaluatethe runtime performance and ontology validity for 75 sam-ples preparation for behavior facts (add ontology instance)

dominates the runtime while the intention results generationis usually fairly fast (under 200 milliseconds) The averageinference runtime is 2 seconds while the analysis for amajority (85) of apps can be completed within 5 secondsThe validation of the reasoning results indicates that 893 ofthe samplersquos reasoning results are correct and the intentionresults of graphical display show high readability Table 9shows the readability ratings of 75 apps

53 Correctness Readability and Effectiveness

Correctness To evaluate the correctness we produce inten-tion descriptions for 75 malware sample apps (10 typical real-world samples and 65 DroidBench samples) using our behav-ior facts extraction algorithm and ontology inference engineFirst we use artificial methods to simulate automation toolsto extract the behavior facts in these samples The extractionmechanism is to construct the environment dependencegraph first and then use themethods in Section 43 tomap thesensitive APIs to behavior facts Second the extracted behav-ior facts are instantiated in the tools provided by protege347 and then the instances that are obviously wrong arefiltered Finally we use Jess inference engine for reasoning thebehavior facts and the SWRL rules are imported to inferenceengine at the same timeThe inference results give the outputof behavior sequence and final goal representation (see thedefinition of malware intention) finally the framework isshown in Figure 3

Results Table 8 presents the experimental results which showthat our inference system achieves a true positive rate of893 We must reiterate that the basis of our reasoning isthe sensitive API database we built can fully map sensitivebehaviors in malware Although it is not perfect it can be

12 Journal of Electrical and Computer Engineering

Table 9 Precision results for Appscan and intention reports

Judgment Appscan report Intention descriptiongeno Driod All geno Droid All

(1) Ineffective 5 0 5 1 2 3(2) Uncertain 1 65 66 1 4 5(3) Effective 4 0 4 8 59 67Total 10 65 75 10 65 75

fully covered in the 75 selected samples Inference systemmisses intention descriptions due to two major reasons(1) Artificial behavior facts extraction lacks accuracy Werely on our environment dependence graph and extractionalgorithm to perform simulation analysis However it is notprecise enough to handle the data flow propagation thatcannot be solved by existing static analysis technology (suchas exception handler code reflective calls) (2)The programlogic in real-world malware is more complex and withsemantic ambiguity and there are too many independentsensitive behaviors (in the form of user defined functions)This is not the result of the lack of theory and only technicalproblem

In fact static analysis in general lacks the capability ofextracting runtime behaviors and can be evaded accordinglyNevertheless we argue that our focus is to construct aframeworkwhich produces description and derivationmodelof malware intention

Readability and Effectiveness To evaluate the readability andeffectiveness of inference output (Intention) we perform auser study on our malware intention inference results Thegoal is twofold First we hope to knowwhether the generatedintentions are readable to average audience Second weexpect to see whether our intention descriptions can actuallyhelp users avoid risky apps

Methodology 360 microscopesrsquo Appscan platform [25] is anonline security scanning systemwhich focuses on the securityscan of Android apps The system will provide a scan reportafter the scanning and describe possible malicious intentof the app to users We uploaded all the samples to 360microscopesrsquo Appscan platform and got 75 security reportsof these samples We also collate ontology inference resultsand show them in the form of formalization and graphics75 samplesrsquo intention reports are produced We use both75 Appscan reports and 75 intention reports to measureuser reaction and the number of vulnerabilities providedby 360 microscopesrsquo Appscan platform is compared withthe intention description Furthermore due to the objectiveevaluation consideration we perform the user study basedon the official intents descriptions of 75 apps We use theseofficial descriptions as a reference standard

We have recruited participant directly from our lab-oratory group and we require participant that must besmartphone users We also make sure that participants haveexperiences with Android malware analysis and understandbasic smartphone events such as ldquointercept SMSrdquo or ldquoaccess

GPS locationrdquo We provide a rating for each sample report(Appscan and intention) with respect to its effectiveness Therating ranges from 1 to 3 where 1 means ineffective 2 meansuncertain and 3 means effective as shown in Table 9

Results and Implications Eventually the results are shownin Table 9 360 microscopesrsquo Appscan platform provides acoarser grained description and users can only judge the trueintents of 4 samples according to the Appscan report Whileusers can judge the malice of 67 samples using intentiondescription report This indicates our intention report isreadable even compared to Appscan report created by 360microscopesrsquo Appscan platform The results also reveal thatthe readability and granularity of intention description arerelatively important while Appscan-generated ones some-times confuse users 66 samples cannot be judged correctlythrough Appscan report In a further investigation we noticethat Appscan report provides a coarse intent descriptionof the sample while our security centered intention reportprovides a fairly fine-grained display of intention We believethat this can be further utilized during threat analysis

Threat Analysis Analysts are able to understand the relation-ships between malware behaviors (including trigger processof behaviors data transfer relation and the change in objectcharacteristics) according to the intention description Thisis of great practical significance for users to understand theimplementation mechanism of malware evaluate its possiblelosses and provide preventive solutions

6 Conclusion

In this paper a novel description and derivation model ofAndroid malware intention based on the theory of intentionand malware reverse engineering is proposed to restore theintention of malware and automate the process of intentionreasoning process In order to standardize the new concep-tual system and automate the intent derivation we createontology of malware intention As ontology is computablewe import SWRL rules represented in ontology languageand Fact Base to Jess engine and Jessrsquos outputs are thedescription of malware intention We evaluate our systemusing 75 typical malware samples (5 kinds) Experimentsshow that our inference system is capable of achieving thedesired results of intention description which proved itsfeasibility and effectiveness In addition by using existingreverse technology and data flow analysis tools we can extractbehavioral facts effectively

Journal of Electrical and Computer Engineering 13

Methods of mapping from Data Source to facts aredivided into two forms artificial extraction and automaticextraction We mainly use the methods of artificial factsextraction There are many opportunities for future workssuch as the implementation of automatic facts extractionmethods and the automatic visualization of the reasoningresults

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work was partly supported by Beijing Key Laboratory ofInternet Culture and Digital Dissemination Research Project(no ICDDXN001) National Key Technology Research andDevelopment Program of the Ministry of Science and Tech-nology of China (no 2015BAK12B03-03) Special Projectof Central Government to Guide the Development ofLocal Science and Technology (no Z171100004717002)and National Natural Science Foundation of China (nos61370065 61502040)

References

[1] P Guojun L Jingwen and S Runkang ldquoResearch and Progressof Android malware detectionrdquo Journal of Wuhan University(Science Edition) vol 61 no 1 pp 21ndash33 2015

[2] A Kharraz S Arshad C Mulliner et al ldquoA Large-ScaleAutomated Approach to Detectingrdquo in Proceedings of the 25thUSENIX Security Symposium (USENIX Security 16) pp 757ndash772 USENIX Association 2016

[3] M-Y Su and K-T Fung ldquoDetection of android malwareby static analysis on permissions and sensitive functionsrdquo inProceedings of the 8th International Conference on Ubiquitousand Future Networks ICUFN 2016 pp 873ndash875 Austria July2016

[4] S Arzt S Rasthofer C Fritz et al ldquoFlowDroid precise contextflow field object-sensitive and lifecycle-aware taint analysis forAndroid appsrdquoACM Sigplan Notices vol 49 no 6 pp 259ndash2692014

[5] E Bratman Michael ldquoWhat is intentionrdquo in Intentions incommunication P R Cohen Ed pp 15ndash32 theMITPress 1990

[6] J Shirley and D Evans ldquoThe user is not the enemy Fightingmalware by tracking user intentionsrdquo in Proceedings of the NewSecurity ParadigmsWorkshop 2008 NSPW rsquo08 pp 33ndash45 USASeptember 2008

[7] C-T Jung C-H Sun and M Yuan ldquoAn ontology-enabledframework for a geospatial problem-solving environmentrdquoComputers Environment and Urban Systems vol 38 no 1 pp45ndash57 2013

[8] I Horrocks F P Patel-Schneider H Boley et al SWRL ASemantic Web Rule Language Combining OWL and RuleMLW3C Member Submission 2004

[9] J Bak C Jedrzejek andM Falkowski ldquoUsage of the jess enginerules and ontology to query a relational databaserdquo LectureNotes in Computer Science (including subseries Lecture Notes

in Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 5858 pp 216ndash230 2009

[10] X-Y Du M Li and S Wang ldquoSurvey on ontology learningresearchrdquo Ruan Jian Xue BaoJournal of Software vol 17 no 9pp 1837ndash1847 2006

[11] S Cheng S Luo Z Li et al ldquoStatic Detection of DangerousBehaviors in Android Appsrdquo in Proceedings of the the 5thinternational symposium on cyberspace safety and securitySpringer International Publishing 2013

[12] IDA pro httpwwwhex-rayscomproductsida[13] J Jian and X Qing ldquoOn case auto generation using reverse

analysis for Android malwarerdquo Journal of HeFei University ofTechnology (Natural Science Edition) vol 39 no 4 pp 466ndash4702016

[14] W Yang X Xiao B Andow S Li T Xie and W EnckldquoAppContext differentiating malicious and benign mobile appbehaviors using contextrdquo in Proceedings of the 37th IEEEACMInternational Conference on Software Engineering (ICSE rsquo15) vol1 pp 303ndash313 IEEE May 2015

[15] Y Li T Shen X Sun et al ldquoDetection Classification andCharacterization of Android Malware Using API Data De-pendencyrdquo in Proceedings of the International Conference onSecurity and Privacy in Communication Systems vol 164 pp23ndash40 Springer International Publishing 2015

[16] M Zhang Y Duan H Yin and Z Zhao ldquoSemantics-awareAndroid malware classification using weighted contextual APIdependency graphsrdquo in Proceedings of the 21st ACM Conferenceon Computer and Communications Security (CCS rsquo14) pp 1105ndash1116 ACM Scottsdale Ariz USA November 2014

[17] P Wu H Changzhen and Y Shuping ldquoA dynamic intrusiveintention recognition method based on timed automatardquo Jour-nal of Computer Research and Development vol 48 no 7 pp1288ndash1297 2011

[18] J-H Han S-J Lee and J-H Kim ldquoBehavior Hierarchy-BasedAffordance Map for Recognition of Human Intention and ItsApplication to Human-Robot Interactionrdquo IEEE Transactionson Human-Machine Systems vol 46 no 5 pp 708ndash722 2016

[19] K Tam S J Khan A Fattori et al ldquoCopperDroid automaticreconstruction of android malware behaviorsrdquo in Proceedingsof the Network and Distributed System Security Symposium SanDiego Calif USA 2015

[20] M Zhang and H Yin ldquoAutomatic Generation of Security-Centric Descriptions for Android Appsrdquo inAndroid ApplicationSecurity Springer International Publishing 2016

[21] Z Wang C Li Z Yuan Y Guan and Y Xue ldquoDroidChain Anovel Android malware detection method based on behaviorchainsrdquo Pervasive andMobile Computing vol 32 pp 3ndash14 2016

[22] X Guozhi System science and Engineering Research Shanghaiscience and Technology Education Press 2001

[23] Y Zhou and X Jiang ldquoDissecting android malware char-acterization and evolutionrdquo in Proceedings of the 33rd IEEESymposium on Security and Privacy pp 95ndash109 IEEE 2012

[24] httpcopperdroidisgrhulacukcopperdroidindexphp[25] httpappscan360cn

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

Journal of Electrical and Computer Engineering 3

determine the true intention of attackers and predict thesubsequent behavior of attackers It is the premise andfoundation of threat analysis and decision response andis an important component of network security situationalawareness Shirley and Evans [6] get the malicious judgmentof software behavior according to the matching degree ofuserrsquos operation intention and software behavior In the fieldof artificial intelligence [18] the human intention identifiedby agents is that of formal representation These studiesprovide a theoretical basis for our malware intention mod-eling

3 Model of Malware Intention

This chapter focuses on the modeling process of malwareintention First the formalized definition of software inten-tion is given according to the abstract of the problem andrelated literature Then the key component in the definitionof intention has been defined and explained Finally theinference process of intention model has been proved usingmathematical theorem

31 Formalization of Software Intention In the study of be-havioral theory Bratman Michael [5] believes that intentionshould be a basic unit of behavior research and suggests thatintention is behavior sequence based on future orientationthat is the intention will influence the behavior of the nextstep After the synthesis and analysis of a large numberof definitions of intention the connotation of intention asa sequence of actions directed at a particular purpose isobtained

To obtain the formalized definition of software intentionprogram is regarded as combination of data structure andalgorithm Then the problem can be abstracted as followsAbstracting a function call or program operation of a sen-sitive API into an action a sequence of actions that containspecific relations can perform certain functionsThey operateon a set of input data including event messages user inputprivacy data and constant argumentsThe semantics of inputdata are abstracted into an object description of actionWhentrigger condition is satisfied a malware starts its processof operations This set of data (objects) reaches a certainstate (data in final) after a series of operations eventuallywhile these final states reflect the purpose of the programrsquosexecution

Definition 1 Software intention is a sequence of softwarebehaviors directed at a particular purpose formally repre-sented as

997888rarr120596 (Intention)

= (⟨1205961 (1198751) 1205962 (1198752) 120596119899 (119875119899)⟩ Goal) (1)

In formula (1) ldquo997888rarr120596rdquo refers to directionality which in-dicates the behavior sequence ⟨1205961(1198751) 1205962(1198752) 120596119899(119875119899)⟩points to the purpose Goal120596 represents temporal and spatialattributes of behavior and temporal attributes describe spe-cific temporal relationships between actions spatial attributesindicate the impact of behavior on external objects

32 Intention Elements The elements of software inten-tion include software behavior sequence behavior objects(behavior facts) and software goals which are defined belowseparately They are the theoretical basis for the derivation ofSection 33 and the extraction method of behavior facts isdescribed in Section 43

Definition 2 Behavior object is an abstract description ofthe data or device objects that Android API invocation canoperate on It is a three-tuple

Beh object = (obj name attribute attri value) (2)

In tuple (2) obj name represents object name attributerefers to the objectrsquos property and attri value represents theproperty value Different behaviors act on different objectsand these objects correspond to different entities in theprogram

Behavior objects usually presented in the form of a classobject (system resource management network connectionobjects and network buffer objects) constants (SP numberadvertising link URL and file address) local executablefiles (system command local library (so) file) and passedvariables (parameter data path) The abstract semantics ofthese objects can be determined by certain mapping rules

Definition 3 Software behavior is an abstract software oper-ation that can operate on the object and change its propertyvalue formally represented as

Behavior = (beh name InputOutput)

Input = Beh object

Output = Beh object1015840

(3)

In formula (3) beh name represents the name of abehavior Input represents the input object descriptionsof this behavior and Output represents the output objectdescriptions of this behavior These behaviors can be dividedinto two types temporarily according to the number of theirinput in this paper B1 (single input single output) B2(double input double output)

We have discovered 102 significant behaviors involving115 sensitive APIs via sensitive API mining in GenomersquosAndroid apps and the summary of related literature [15 19ndash21] We also refer to Androidrsquos official API document and adatabase of sensitive API behavior is established based onthe information collected These APIs are controlled by 30Android sensitive permissions and have detailed descriptionsin official documents

We generate behavior facts forAPI patterns based on theirinternal program logics There are APIs whose semanticsare similar and can be classified as a class of behavior Wefurther study the extraction mechanism Table 1 presentsthe three major logics that we used (1) Sequential relationAPIs constitute a special workflow For instance connect()always happens before execute() since the first provides the

4 Journal of Electrical and Computer Engineering

Table 1 The rules of behavior semantic extraction

Program structure Location of behavior semantic(1) Sequential relation Extracting both of these APIs(2) Prepared for the latter Extracting the latter API(3)Multilevel data access Extracting the former API

second with necessary inputs In this case we study thedocument of both APIs and extract behaviors from both ofthem (2) A former object is retrieved for latter operationsFor example a SmsMessagecreateFromPdu(pdus) is alwaysinvoked prior to SmsMessagegetMessageBody() because theformer fetches the default object SmsMessage that the latterneeds We then only extract behaviors from the latter APIs(3) Multilevel data is accessed through multiple levels ofAPIs For example when accessing location data we first callgetLastKnownLocation() to return a location object and thencall getLongitude() and getLatitude() to get the longitudeand latitude from this object As these higher level APIs aremeaningful enoughwehence only extract behaviors from theformer APIs

According to the example analysis of malware samplesand the description of related literature we summarize avariety of malware behaviors and give the behavior set 119864

119864amp 120593right gain 120593access 120593store 120593encrypt 120593monitor 120593intercept

120593connect 120593transmit 120593send 120593dial 120593decode 120593install 120593popup

120593tamper 120593delete 120593hide 120593remote control

(4)

Taking the monitoring and interception of broadcastmessage as an example explains how behavior is describedBehavior can be described as a mapping relationship

(i) Broadcast message monitoring

120593monitor Broadcast info 997888rarr Broadcast info1015840

Broadcast info = (Broadcast info is monitoredNo)

Broadcast info1015840

= (Broadcast info is monitoredYes)

(5)

(ii) Broadcast message interception

120593intercept Broadcast info1015840 997888rarr Broadcast info10158401015840

Broadcast info1015840

= (Broadcast info is monitoredYes)

Broadcast info10158401015840

= (Broadcast info is interceptedYes)

(6)

Behavior is not independent and irrelevant there isalways a relation between the outputs of one behaviorcorresponding to the inputs of another behavior One ofthe most common relationships is data dependence that

is the execution input of the later behavior needs theexecution output of the previous behavior Another commonrelationship is control dependence although two behaviorshave no direct data flow relationship the execution of thelatter behaviors needs the execution of the previous one asa trigger condition In former example the relation betweenmonitor and intercept is data dependence relation and can bedescribed as follows

Output (120593monitor) 997888rarr Input (120593intercept)

Broadcast info 997888rarr Broadcast info1015840

997888rarr Broadcast info10158401015840

(7)

Definition 4 Software purpose is the final state of all inputobjects that act in an intention formally represented as

Goal = (object1 attribute attri value) times sdot sdot sdot

times (object119899 attribute attri value) (8)

In purpose representation all input objectsobject1 sdot sdot sdot object119899 represent the objects involved in theintention For example after the extraction of sensitivebehaviors in a malware and having a formal semanticrepresentation we use the ontology reasoning engine toautomate the reasoningWe can eventually get the descriptionof the relationship between behaviors and the final statesof all the objects For instance we use ldquo(SMS positionldquo1062568rdquo)rdquo to illustrate that the SMS is sent to ldquo1062568rdquo(URL is used Yes) represents the fact that the URL is usedin the related function and (DeviceId is encrypt Yes)illustrates that the DeviceId has been encrypted

33 Proof of Model If intention is viewed as a system [22]the basic component of intention is therefore the behaviorAfter the inputs and outputs involved in these behaviors aredefined a series of basic mapping relationships are obtainedwhich reflects the influence on external object of the basicbehavior The overall nature of an intention system can bedetermined by a set of mappings between a set of inputs anda set of outputs Lemmas 5 and 6 are the basic mathematicaltheory of our behavior deduction

Lemma 5 (function combination) The union of mapping set120593119894 is a mapping 120593 119860 rarr 119861

120593 =119899

⋃119894=1

120593119894

119860 =119899

⋃119894=1

119860 119894

Journal of Electrical and Computer Engineering 5

119861 =119899

⋃119894=1

119861119894

120593119894 119860 119894 rarr 119861119894 1 le 119894 le 119899(9)

when (forall119860 119894)(forall119860119895)(119860 119894 sube 119860 and 119860119895 sube 119860 and 119860 119894 = 119860119895 rarr 119860 119894 cap 119860119895 =)

Lemma 6 (function compound) Two-tuple 120593 consisted of aset of mappings Φ and its compound relation 120585∘ is a mapping120593 dom(1205931) = ran(120593119899)

120593 = (Φ 120585∘)

Φ = 120593119894 | 120593119894 119860 119894 rarr 119861119894 119860 119894 = dom (120593119894) 119861119894

= ran (120593119894) 1 le 119894 le 119899

120585∘ = 120593119894+1 ∘ 120593119894 | 120593119894 120593119894+1 isin Φ 1 le 119894 le 119899 minus 1

(10)

when (forall119894)((1 le 119894 le 119899 minus 1) rarr ran(120593119894) sube dom(120593119894+1))

According to Lemmas 5 and 6 we proved Corollaries 7and 8

Corollary 7 (compound intention deduction) 1205931 1205932 120593119899 represents a set of behaviors that are involved in an inten-tion If the relation between these behaviors is 119874119906119905119901119906119905(120593119894) =119868119899119901119906119905(120593119894+1) 1 le 119894 le 119899 minus 1 the output 119874119906119905119901119906119905(120593119899)of behavior 120593119899 is therefore the representation of final goalbehavior sequence is denoted as 120585119894 = (1205931 1205932 120593119899)

Proof 1205931 1205932 120593119899 are a set of behaviors By the definitionof 120593 behavior satisfies the mapping relation between objectsand can be regarded as function mapping 120593119894 and 120593119894+1 satisfythe former output corresponding to the latter input that isthe range of 120593119894 is equal to the domain of 120593119894+1 which satisfies(forall119894)((1 le 119894 le 119899 minus 1) rarr ran(120593119894) sube dom(120593119894+1)) According toLemma 6 Corollary is proved

Corollary 8 (combination-compound intention deduction)1205931 1205932 120593119899 represents a set of behaviors that are involvedin an intention If there is an input to output relation119874119906119905119901119906119905(120593119898) = 119868119899119901119906119905(120593119906) 1 le 119898 119906 le 119899 and parallelrelationship 119874119906119905119901119906119905(120593119894) cup 119874119906119905119901119906119905(120593119895) = 119868119899119901119906119905(120593119896) 1 le119894 119895 119896 le 119899 at the same time then the output 119874119906119905119901119906119905(120593119906) cup119874119906119905119901119906119905(120593119896) of 120593119906 and 120593119896 is the final goal Behavior sequencesare denoted as 120585119894 = (Φ 119877) Φ is behavior set 119877 is therelationship between these behaviors

Proof 1205931 1205932 120593119899 is known as a set of behaviors Bythe definition of 120593 behavior satisfies the mapping relationbetween objects and can be regarded as function mapping120593119898 and 120593119906 satisfy the formerrsquos output corresponding to thelatterrsquos input that is the range of 120593119898 is equal to the domain of120593119906 dom(120593119906) is the compound output according to Lemma 6120593119894 and 120593119895 satisfy both of their outputs as the input of 120593119896because the intersection of the inputs of 120593119894 and 120593119895 is empty(ran(120593119894) cup ran(120593119895) = ) they satisfy Lemma 5 120593119894 and 120593119895

satisfy combination relation which could be combined into anew mapping

1205931015840 ran (120593119894) cup ran (120593119895) 997888rarr dom (120593119894) cup dom (120593119895) (11)

Then 1205931015840 and 120593119896 satisfy the condition of Lemma 6 dom(120593119896)is the compound output Corollary is proved according toLemma 6

4 Ontology Inference System

This chapter elaborates the construction process of ontologyinference system Section 1 gives the construction specifica-tion of ontology model [10] Section 2 gives SWRL rules usedin inference engine Section 3 is the description of mappingmethods from Data Source to Fact Base Section 4 elaboratesthe framework of inference system

41 Ontology Model The reasons for using ontology inour inference system are as follows (1) The conceptualsystem based on ontology is computable and the automaticdeduction of intentions can be achieved by the computingbetween concepts The system has reduced the workload ofcode writing accordingly (2) The extensibility of ontologyenables us to extend the conceptual model at any timeaccording to the emergence of new features of malwareOnce new malware knowledge appears we only need tomodify the ontology model based on the knowledge Inthis way we have reduced the amount of code update andmaintenance (3) The ontology model of the elements andtheir relationships in malware behavior intention domaincan standardize domain knowledge and make it shareable[10] It can provide a standardized representation formalwareintention

The ontology model of malware behavior intention usesthe following knowledge the definition and classification ofbehavior and behavior object intention model and Corol-laries 7 and 8 The concepts and the relation are shownin Figure 1 The definitions of each concept are given inSection 3 We use Pellet engine reasoning to verify theconsistence of ontology

The definitions of object attributes and data attributes areillustrated in Table 2

42 SWRL Inference Rules This section uses the knowledgeof Corollaries 7 and 8 and Definition 4 in Section 3 Beforewriting inference rules we need to define the format of basicfacts The fact is a composition description of basic behaviorThere are two categories of basic facts (in our research) B1the behavior with single input and single output B2 thebehavior with double input and double output See Box 1

(1) Inference Rules of the Relationship between Behaviors

Premise 1 Any two B1 behaviors are B11 and B12 B11(in)represents the input of B11 and B11(out) represents its outputIf any two B1 behaviors B11 and B12 have the relation of B11rsquos

6 Journal of Electrical and Computer Engineering

Malware domain

right_gain

accessstore

encryptmonitor

intercept

connect

transmit

install_malware

popup_ad

tamper

dial

delete

hide_file

is a

is a

send

has

beh_objectbehavior

Android malware

obj_nameattribute

attri_value

hasInput

hasOutput

hasA

ttriva

lueha

sAttr

ibut

e hasNam

e

privacy

right

event

malware

parameter

config_file

decode

hasBehavior

Figure 1 Semantic ontology of behavior intention

Figure 2 SWRL inference rules of the relationship between behaviors and final goal

output object corresponding to B12rsquos object input Rule-1 isthe inference rule corresponding to this condition See Box 1

Premise 2 Any two B1 behaviors are B11 and B12 one B2behavior is B2 B11(in) represents the input to B11 andB11(out) represents its output B2(1in) B2(2in) B2(1out) andB2(2out) represent the input and output of B2 respectively

If the outputrsquos union of any two B1 behaviors is the inputof a certain B2 behavior and the input intersection of thesetwo B1 behaviors is empty then B11 and B12 have a compoundrelation with the B2 behavior respectively There is also acombination relation between B11 and B12The rule is B1-B2-Rule-1 shown in Figure 2 and the rest of the rules are similar

(2) Inference Rules of the Final GoalOn the basis of behaviorrsquosinference rules and Definition 4 the inference rules of finalgoal are summarized as in Figure 2 such as Goal-Rule-1 Theoutputs are the descriptions of objectrsquos final state Due to thelimited space other rules are no longer presented

43 Extraction of Behavior Facts The basic facts include B1and B2 behavior and the behaviorrsquos elements which shouldbe extracted from Data Source are as follows

(1) Behavior (name) extracting the behavior descriptionin a program according to the mapping between the

behaviors defined in our sensitive API database andsensitive APIs (or code segment) in program

(2) Input (behavior object) we determine objects de-scription based on the parameters of sensitive APIsand the official document definition

(3) Output (behavior object) object after a behavioroperates on it its object name and attribute willnot change while its attribute value will be affectedTherefore we can determine the changes in attributevalues based on this behaviorrsquos definition

The mapping from Data Source to behavior facts isdivided into two stages the first stage is behavior recognitionthe second includes object identification and relation analysisbetween objects Detailed process is as follows

First Stage Use reverse tool to decompile these malwaresamples and then generate its call graph (CG) and controlflow graph (CFG) The leaf nodes of the call graph aretraversed to find sensitive API and identify the correspondingbehavior according to themapping relation between behaviorand sensitive API calls Partial mapping relations betweenbehaviors and sensitive API calls and object descriptions areshown in Table 3

Second Stage For each identified behavior the behaviorrsquosobject is identified according to the usage of parameter in

Journal of Electrical and Computer Engineering 7

Table 2 Specification of object attributes and data attributes

Attribute name SpecificationhasCombinationwith Behaviors have combination relationhasCompoundwith Behaviors have compound relationhasInput(x y) The behaviorrsquos single input is ldquoyrdquohasOutput(x z) The behaviorrsquos single output is ldquozrdquohasFirstInput(x y1) The first input of double input is ldquoy1rdquohasSecondInput(x y2) The second input of double input is ldquoy2rdquohasFirstOutput(x z1) The first output of double input is ldquoz1rdquohasSecondOutput(x z2) The second output of double input is ldquoz2rdquohasBehavior(x y) The behavior belonging to malware is ldquoyrdquohasObjectName(x y) The name of the object is ldquoyrdquohasAttribute(x y) The attributes of an object are ldquoyrdquohasAttributeValue(x y) The objectrsquos attributes value is ldquoyrdquo

Rule-1 behavior B1(B11) and behavior B1(B12) and hasBehaviorName(B11 bn1)andhasBehaviorName (B12 bn2)and hasOutput(B11op) and hasInput(B12ip)and hasObjectname(op on1) and hasObjectname(ip on2)andswrlbequal(on1on2) and hasAttribute(op att1) and has Attribute(ip att2) and hasValue(att1 attri1) and hasValue(att2 attri2) and swrlbequal (attri1 attri2) and hasAttributeValue(opattv1) and hasAttributeValue (ip attv2) and hasValue(attv1 attriv1) and hasValue(attv2attriv2) and swrlbequal(attriv1attriv2)rarr hasCompoundwith(bn1 bn2)

Box 1

sensitive API (two categories the first obtains behavior objectbased on the class object and the definition of API such asgetDeviceId() the second obtains behavior object based onthe parameter usage of API and the definition of API such asRuntimeexec (RootExploitfile)) Data dependence betweenbehaviors is analyzed using FlowDroid [4]

44 Framework of Inference System The framework of inten-tion inference system is shown in Figure 3

We use the knowledge of intention definition and Corol-laries 7 and 8 to construct SWRL rules (the rules arerepresented in ontology language) Extracting Data Sourcefrom Android applications uses reverse engineering technol-ogy [15] The extraction methods of facts are illustrated inSection 43 Jess inference engine completes the reasoningprocess using the Facts Base and SWRL rules and givesthe inference output (the description of malware intention)finally

45 Motivation Example Take Zitmo [16] as an example toshow the facts extracted from these samples as shown inTable 4 Among these extracted facts access and transmit areB2 behavior B2rsquos double input indicates that the executionof the behavior needs to meet the condition of the two keyinputs

After the rules and facts are imported to Jess enginethe inference results are exported they are the formalized

descriptions of intentions of each malicious sample Theresults of Zitmorsquos behavior relations are shown in Figure 4Other results of intention reasoning are analyzed separatelyand show reasonable efficiency

We extract behavior relations and goal representationfrom Figure 4 which are shown in Table 5

Zitmorsquos intention includes behavior sequence (consistingof behavior set and behavior relation) and the representationof goal (output object description) To visually demonstratethe reasoning results we display them in Figure 5 graphi-cally In Figure 5 rectangle represents the behavior ellipserepresents the input or output object of the behavior andarrow indicates the relationship between object and behaviorThe ellipse with a black font represents the raw input ofobjects involved in the intention respectively as broadcastinformation text messages and URL addresses These finalstates are as follows broadcast message is monitored andintercepted the contents of the messages are transmitted toremote server and the URL is used as the destination addressof the transmission They represent the goal of the intentiontogether

5 Evaluation

We have implemented the ontology inference systemdescribed in Section 44 in a prototype inference system foridentifying and describing malware intention in behavior

8 Journal of Electrical and Computer Engineering

Table 3 Partial mappings between behavior and sensitive APIs

Behavior Behavior object sensitiveAPIright gain Root permission Runtimeexec()

access

Device ID getDeviceId()Carrier name getNetworkOperatorName()Phone position getCellLocation()Short Message createFromPdu()

send Number and contents sendTextMessage()intercept Broadcast information abortBroadcast()connect URL-parameter URLConnectionconnect()transmit Parameter execute()encrypt Parameter setEntity()store Parameter writeRec()

Act Obj Act

ActObj

Obj

Data source

Inference engine

SWRL rules

Corollaries7 and 8

Ontology

Inference output(intention description)

Malware

Facts base

1

2Intention

definition

Figure 3 The framework of intention inference system

Figure 4 Inference results of Zitmorsquos behavior relations

Broadcastis_monitored No monitor Broadcast

is_monitored Yes interceptBroadcast

is_ intercepted Yes

Beh_object SmsMessageposition inDevice access SmsMessage

position inMemory encrypt SmsMessageis_plain No

Input or Output URLis_used No connect

Behavior

URLis_used Yes transmit

SmsMessageposition remoteServer

Figure 5 Visualization of Zitmorsquos inference results

Journal of Electrical and Computer Engineering 9

Table 4 Behavior facts extracted from Zitmo

Behavior Code segment or function Behavior objectmonitor onReceive() Broadcastintercept abortBroadcast() Broadcastrsquo

accesscreateFrompdu()getMessageBody()

getOriginatingAddress()

BroadcastrsquoSmsMessage

connect new HttpPost(URL) URLencrypt setEntity() SmsMessage

transmit execute() URLrsquoSmsMessage

Table 5 The inference results of Zitmo

Behavior relationship Final goal representation

hasCompoundwith(monitor intercept) (Broadcast is monitored Yes)(Broadcast is intercepted Yes)

hasCompoundwith(monitor access) (SmsMessage positioninMemory)

hasCompoundwith(access encrypt) (SmsMessage is plain No)hasCombinationwith(connect encrypt) NohasCompoundwith(encrypt transmit) (URL is used Yes)

hasCompoundwith(connect transmit) (SmsMessage positionremoteServer)

facts In this section we evaluate the following aspectsFirst we verify whether the sensitive API library we buildcan cover the behavior facts that exist in malware samplesand give the coverage rate analysis Second to verifythe effectiveness of artificial extraction (semiautomated)behavior facts we compare our manual analysis resultswith the results generated by automated tool CopperDroid[19] Thirdly the effectiveness and reasoning performanceof the ontology inference system are shown Finally thecorrectness readability and effectiveness of the results ofintention reasoning are tackled

We have selected 75 typical malware samples (1 real-world ransomware sample and 9 real-world samples fromGenomersquos 1260 samples and 65 DroidBench samples) forevaluation which are shown in Table 6 Among these 10real-world samples 9 samples are collected from Androidmalware database established by Zhou and Jiang [23] inNorth Carolina State University We use these real-worldsamples because they are typical and representative of theirfamily Other samples in Genome are in same family sothe vulnerabilities and threats in the program are similarFor efficiency consideration we do not add the remainingsamples to experimental sample set DroidBench [4] samplesare designed to assess the correctness of static analyseson Android apps We use these samples as the groundtruths because they have open-sourced programs with clearsemantics [20] Thus we can make an adequate analysis ofthese samples In fact static analysis technique in generaloften lacks capability of extracting runtime behaviors and canbe evaded accordingly but it has incomparable advantagesover dynamic technology Nevertheless this paper focuses

on the research of software intention reduction that is theemphasis is to study the extraction and semantic mappingmethods of malwarersquos behavior the extraction of relationshipbetween behaviors and the representation of malwarersquos finalgoal

51 Effectiveness of Behavior Facts Extraction In generalwe have discovered 102 significant behaviors involving 115sensitive APIs via sensitive APImining inGenomersquos Androidapps and the summary of related literature [15 19ndash21] Wealso refer to Androidrsquos official API document and based onthe information collected a database of sensitiveAPI behavioris established These APIs are controlled by 30 Androidsensitive privileges and have detailed descriptions in officialdocuments

To evaluate the effectiveness and validity of behaviorfacts extracting we compare the description of our behaviorfacts extraction and CopperDroidrsquos [19] behavior descriptionCopperDroid is an automatic VMI-based dynamic analysissystem to reconstruct the behaviors of Android malware thenovelty of CopperDroid lies in its agnostic approach to iden-tify interesting OS- and high-level Android-specific behav-iors We perform a user study on CopperDroid platform andour extraction methods the goal is twofold First we give acomparative analysis on the behavior coverage rate of thesesamples Second we hope to know whether the behaviordescription generated by our methods is readable to averageaudience To this end we compare the behavior descriptionof our methods and CopperDroidrsquos Public Reports [24] Wehave collected 150 copies of CopperDroidrsquos public malwaresample analysis report and make a statistical analysis As

10 Journal of Electrical and Computer Engineering

Table6Malwares

amples

andtheirb

ehaviorinformation

Samplefam

ilyNum

ber

Ours(Cop

per)

Major

behavior

Intentioncla

ssificatio

nZitm

o1

6(2)

mon

itorinterceptaccesscon

nectencrypttransmit

Privacyste

aling

GoldD

ream

19(4)

mon

itoraccessstorecon

necttransmit

Privacyste

aling

DroidDream

15(3)

right

gainaccessconn

ecttransm

itPrivacyste

aling

DroidDelu

xe1

3(2)

mon

itorrig

htgainaccessconn

ecttransm

itPrivacyste

aling

Hippo

SMS

14(2)

send

mon

itoraccessintercept

Tariff

consum

ption

Geinimi

16(3)

remotecontrolsend

con

necttransmit

Tariff

consum

ption

RogueSPP

ush

16(2)

send

mon

itoraccessinterceptdelete

Malicious

chargeback

GGTracker

19(4)

accesscon

necttransmit

storeencryptm

onito

rsend

intercept

Malicious

chargeback

DroidKu

ngFu

-Upd

ate

14(2)

conn

ecttransm

itinsta

llmalpop

upMalwarep

ropagatio

nLo

vebu

ckleword

14(1)

popu

ptamperconn

ecttransm

itrig

htgain

Extortionuser

Aliasin

g1

2(2)

accesssend

Privacyleak

And

roidSpecific

93(2)

accesslogging

send

Privacyleak

ArraysA

ndLists

72(2)

accesssend

Privacyleak

Callb

acks

42(2)

accesssend

Privacyleak

EmulatorDetectio

n3

3(2)

accesslogging

send

Privacyleak

FieldA

ndObjectSensitivity

33(2)

accesslogging

send

Privacyleak

GeneralJava

143(2)

accesslogging

send

Privacyleak

ImplicitF

lows

42(2)

accesslogging

Privacyleak

InterA

ppCom

mun

ication

32(2)

accesssend

Privacyleak

Lifecycle

114(2)

accesscon

nectsendlogging

Privacyleak

Reflection

42(2)

accesssend

Privacyleak

Threading

23(2)

accesslogging

send

Privacyleak

Journal of Electrical and Computer Engineering 11

Table 7 Behavior semantics between ours and CopperDroid

Behavior class Objects Copper Ours Similar

Access personal infoSMS phoneaccountslocation

4 18 90

FS access Open Write 2 2 100

Network access Generic HTTPDNS 3 3 100

Send SMS SMS 1 1 100

Execute external file Generic Priv EscShell Inst APK 6 8 75

Encrypt info File info 0 2 0Intercept SMS SMS 0 2 0Store info Privacy info 0 1 0

Table 8 Intention generation results for samples

Sample set False status Missing desc Correct 119879Genome 1 1 8 10DroidBench 4 2 59 65Total 5 3 67 75

shown in Table 6 the third column (oursCopper) indicatesthe number of behaviorsrsquo extraction using our methodsand the number of behaviors extracted by CopperDroidfor the same sample In terms of the number of extractedbehaviors our extractionmechanism (87) is 776more thanCopperDroidrsquos behavior number (49)

We further study whether the behavior description gen-erated by our methods is readable to average audience Asbehavior facts extraction is the basis of this studywe thus pro-duce 75 behavior descriptions reports for these 75 samplesThe result illustrated in Table 7 depicts that the descriptionsderived from ours are richer than CopperDroid while theratio of behavior similarity ranges from75 to 100Copper-Droid failed to extract some of the behaviors we extract suchas encrypt intercept and store personal info Thus we arguethat the behaviors we extract can reflect the common pro-gram semantics and programming conventions effectively

Notice that though it may take artificial effort to generatebehavior facts description behavior extraction is a technol-ogy that has already been realized We refer to the work ofpredecessors extractingmalware behaviors though automaticextraction technology is not somatureThe focus of this workis to propose a framework for the description and deductionof behavioral intentions We argue that any analysis toolsboth static and dynamic can be utilized in our frameworkto achieve the described malware intention

52 Validity of Ontology Inference System Next we evaluatethe validity of ontology inference system Ontology inferenceexperiment reveals that the workload of manual writing codereduced significantly and Pellet engine reasoning verifiedthat the ontology is complete and consistent We evaluatethe runtime performance and ontology validity for 75 sam-ples preparation for behavior facts (add ontology instance)

dominates the runtime while the intention results generationis usually fairly fast (under 200 milliseconds) The averageinference runtime is 2 seconds while the analysis for amajority (85) of apps can be completed within 5 secondsThe validation of the reasoning results indicates that 893 ofthe samplersquos reasoning results are correct and the intentionresults of graphical display show high readability Table 9shows the readability ratings of 75 apps

53 Correctness Readability and Effectiveness

Correctness To evaluate the correctness we produce inten-tion descriptions for 75 malware sample apps (10 typical real-world samples and 65 DroidBench samples) using our behav-ior facts extraction algorithm and ontology inference engineFirst we use artificial methods to simulate automation toolsto extract the behavior facts in these samples The extractionmechanism is to construct the environment dependencegraph first and then use themethods in Section 43 tomap thesensitive APIs to behavior facts Second the extracted behav-ior facts are instantiated in the tools provided by protege347 and then the instances that are obviously wrong arefiltered Finally we use Jess inference engine for reasoning thebehavior facts and the SWRL rules are imported to inferenceengine at the same timeThe inference results give the outputof behavior sequence and final goal representation (see thedefinition of malware intention) finally the framework isshown in Figure 3

Results Table 8 presents the experimental results which showthat our inference system achieves a true positive rate of893 We must reiterate that the basis of our reasoning isthe sensitive API database we built can fully map sensitivebehaviors in malware Although it is not perfect it can be

12 Journal of Electrical and Computer Engineering

Table 9 Precision results for Appscan and intention reports

Judgment Appscan report Intention descriptiongeno Driod All geno Droid All

(1) Ineffective 5 0 5 1 2 3(2) Uncertain 1 65 66 1 4 5(3) Effective 4 0 4 8 59 67Total 10 65 75 10 65 75

fully covered in the 75 selected samples Inference systemmisses intention descriptions due to two major reasons(1) Artificial behavior facts extraction lacks accuracy Werely on our environment dependence graph and extractionalgorithm to perform simulation analysis However it is notprecise enough to handle the data flow propagation thatcannot be solved by existing static analysis technology (suchas exception handler code reflective calls) (2)The programlogic in real-world malware is more complex and withsemantic ambiguity and there are too many independentsensitive behaviors (in the form of user defined functions)This is not the result of the lack of theory and only technicalproblem

In fact static analysis in general lacks the capability ofextracting runtime behaviors and can be evaded accordinglyNevertheless we argue that our focus is to construct aframeworkwhich produces description and derivationmodelof malware intention

Readability and Effectiveness To evaluate the readability andeffectiveness of inference output (Intention) we perform auser study on our malware intention inference results Thegoal is twofold First we hope to knowwhether the generatedintentions are readable to average audience Second weexpect to see whether our intention descriptions can actuallyhelp users avoid risky apps

Methodology 360 microscopesrsquo Appscan platform [25] is anonline security scanning systemwhich focuses on the securityscan of Android apps The system will provide a scan reportafter the scanning and describe possible malicious intentof the app to users We uploaded all the samples to 360microscopesrsquo Appscan platform and got 75 security reportsof these samples We also collate ontology inference resultsand show them in the form of formalization and graphics75 samplesrsquo intention reports are produced We use both75 Appscan reports and 75 intention reports to measureuser reaction and the number of vulnerabilities providedby 360 microscopesrsquo Appscan platform is compared withthe intention description Furthermore due to the objectiveevaluation consideration we perform the user study basedon the official intents descriptions of 75 apps We use theseofficial descriptions as a reference standard

We have recruited participant directly from our lab-oratory group and we require participant that must besmartphone users We also make sure that participants haveexperiences with Android malware analysis and understandbasic smartphone events such as ldquointercept SMSrdquo or ldquoaccess

GPS locationrdquo We provide a rating for each sample report(Appscan and intention) with respect to its effectiveness Therating ranges from 1 to 3 where 1 means ineffective 2 meansuncertain and 3 means effective as shown in Table 9

Results and Implications Eventually the results are shownin Table 9 360 microscopesrsquo Appscan platform provides acoarser grained description and users can only judge the trueintents of 4 samples according to the Appscan report Whileusers can judge the malice of 67 samples using intentiondescription report This indicates our intention report isreadable even compared to Appscan report created by 360microscopesrsquo Appscan platform The results also reveal thatthe readability and granularity of intention description arerelatively important while Appscan-generated ones some-times confuse users 66 samples cannot be judged correctlythrough Appscan report In a further investigation we noticethat Appscan report provides a coarse intent descriptionof the sample while our security centered intention reportprovides a fairly fine-grained display of intention We believethat this can be further utilized during threat analysis

Threat Analysis Analysts are able to understand the relation-ships between malware behaviors (including trigger processof behaviors data transfer relation and the change in objectcharacteristics) according to the intention description Thisis of great practical significance for users to understand theimplementation mechanism of malware evaluate its possiblelosses and provide preventive solutions

6 Conclusion

In this paper a novel description and derivation model ofAndroid malware intention based on the theory of intentionand malware reverse engineering is proposed to restore theintention of malware and automate the process of intentionreasoning process In order to standardize the new concep-tual system and automate the intent derivation we createontology of malware intention As ontology is computablewe import SWRL rules represented in ontology languageand Fact Base to Jess engine and Jessrsquos outputs are thedescription of malware intention We evaluate our systemusing 75 typical malware samples (5 kinds) Experimentsshow that our inference system is capable of achieving thedesired results of intention description which proved itsfeasibility and effectiveness In addition by using existingreverse technology and data flow analysis tools we can extractbehavioral facts effectively

Journal of Electrical and Computer Engineering 13

Methods of mapping from Data Source to facts aredivided into two forms artificial extraction and automaticextraction We mainly use the methods of artificial factsextraction There are many opportunities for future workssuch as the implementation of automatic facts extractionmethods and the automatic visualization of the reasoningresults

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work was partly supported by Beijing Key Laboratory ofInternet Culture and Digital Dissemination Research Project(no ICDDXN001) National Key Technology Research andDevelopment Program of the Ministry of Science and Tech-nology of China (no 2015BAK12B03-03) Special Projectof Central Government to Guide the Development ofLocal Science and Technology (no Z171100004717002)and National Natural Science Foundation of China (nos61370065 61502040)

References

[1] P Guojun L Jingwen and S Runkang ldquoResearch and Progressof Android malware detectionrdquo Journal of Wuhan University(Science Edition) vol 61 no 1 pp 21ndash33 2015

[2] A Kharraz S Arshad C Mulliner et al ldquoA Large-ScaleAutomated Approach to Detectingrdquo in Proceedings of the 25thUSENIX Security Symposium (USENIX Security 16) pp 757ndash772 USENIX Association 2016

[3] M-Y Su and K-T Fung ldquoDetection of android malwareby static analysis on permissions and sensitive functionsrdquo inProceedings of the 8th International Conference on Ubiquitousand Future Networks ICUFN 2016 pp 873ndash875 Austria July2016

[4] S Arzt S Rasthofer C Fritz et al ldquoFlowDroid precise contextflow field object-sensitive and lifecycle-aware taint analysis forAndroid appsrdquoACM Sigplan Notices vol 49 no 6 pp 259ndash2692014

[5] E Bratman Michael ldquoWhat is intentionrdquo in Intentions incommunication P R Cohen Ed pp 15ndash32 theMITPress 1990

[6] J Shirley and D Evans ldquoThe user is not the enemy Fightingmalware by tracking user intentionsrdquo in Proceedings of the NewSecurity ParadigmsWorkshop 2008 NSPW rsquo08 pp 33ndash45 USASeptember 2008

[7] C-T Jung C-H Sun and M Yuan ldquoAn ontology-enabledframework for a geospatial problem-solving environmentrdquoComputers Environment and Urban Systems vol 38 no 1 pp45ndash57 2013

[8] I Horrocks F P Patel-Schneider H Boley et al SWRL ASemantic Web Rule Language Combining OWL and RuleMLW3C Member Submission 2004

[9] J Bak C Jedrzejek andM Falkowski ldquoUsage of the jess enginerules and ontology to query a relational databaserdquo LectureNotes in Computer Science (including subseries Lecture Notes

in Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 5858 pp 216ndash230 2009

[10] X-Y Du M Li and S Wang ldquoSurvey on ontology learningresearchrdquo Ruan Jian Xue BaoJournal of Software vol 17 no 9pp 1837ndash1847 2006

[11] S Cheng S Luo Z Li et al ldquoStatic Detection of DangerousBehaviors in Android Appsrdquo in Proceedings of the the 5thinternational symposium on cyberspace safety and securitySpringer International Publishing 2013

[12] IDA pro httpwwwhex-rayscomproductsida[13] J Jian and X Qing ldquoOn case auto generation using reverse

analysis for Android malwarerdquo Journal of HeFei University ofTechnology (Natural Science Edition) vol 39 no 4 pp 466ndash4702016

[14] W Yang X Xiao B Andow S Li T Xie and W EnckldquoAppContext differentiating malicious and benign mobile appbehaviors using contextrdquo in Proceedings of the 37th IEEEACMInternational Conference on Software Engineering (ICSE rsquo15) vol1 pp 303ndash313 IEEE May 2015

[15] Y Li T Shen X Sun et al ldquoDetection Classification andCharacterization of Android Malware Using API Data De-pendencyrdquo in Proceedings of the International Conference onSecurity and Privacy in Communication Systems vol 164 pp23ndash40 Springer International Publishing 2015

[16] M Zhang Y Duan H Yin and Z Zhao ldquoSemantics-awareAndroid malware classification using weighted contextual APIdependency graphsrdquo in Proceedings of the 21st ACM Conferenceon Computer and Communications Security (CCS rsquo14) pp 1105ndash1116 ACM Scottsdale Ariz USA November 2014

[17] P Wu H Changzhen and Y Shuping ldquoA dynamic intrusiveintention recognition method based on timed automatardquo Jour-nal of Computer Research and Development vol 48 no 7 pp1288ndash1297 2011

[18] J-H Han S-J Lee and J-H Kim ldquoBehavior Hierarchy-BasedAffordance Map for Recognition of Human Intention and ItsApplication to Human-Robot Interactionrdquo IEEE Transactionson Human-Machine Systems vol 46 no 5 pp 708ndash722 2016

[19] K Tam S J Khan A Fattori et al ldquoCopperDroid automaticreconstruction of android malware behaviorsrdquo in Proceedingsof the Network and Distributed System Security Symposium SanDiego Calif USA 2015

[20] M Zhang and H Yin ldquoAutomatic Generation of Security-Centric Descriptions for Android Appsrdquo inAndroid ApplicationSecurity Springer International Publishing 2016

[21] Z Wang C Li Z Yuan Y Guan and Y Xue ldquoDroidChain Anovel Android malware detection method based on behaviorchainsrdquo Pervasive andMobile Computing vol 32 pp 3ndash14 2016

[22] X Guozhi System science and Engineering Research Shanghaiscience and Technology Education Press 2001

[23] Y Zhou and X Jiang ldquoDissecting android malware char-acterization and evolutionrdquo in Proceedings of the 33rd IEEESymposium on Security and Privacy pp 95ndash109 IEEE 2012

[24] httpcopperdroidisgrhulacukcopperdroidindexphp[25] httpappscan360cn

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

4 Journal of Electrical and Computer Engineering

Table 1 The rules of behavior semantic extraction

Program structure Location of behavior semantic(1) Sequential relation Extracting both of these APIs(2) Prepared for the latter Extracting the latter API(3)Multilevel data access Extracting the former API

second with necessary inputs In this case we study thedocument of both APIs and extract behaviors from both ofthem (2) A former object is retrieved for latter operationsFor example a SmsMessagecreateFromPdu(pdus) is alwaysinvoked prior to SmsMessagegetMessageBody() because theformer fetches the default object SmsMessage that the latterneeds We then only extract behaviors from the latter APIs(3) Multilevel data is accessed through multiple levels ofAPIs For example when accessing location data we first callgetLastKnownLocation() to return a location object and thencall getLongitude() and getLatitude() to get the longitudeand latitude from this object As these higher level APIs aremeaningful enoughwehence only extract behaviors from theformer APIs

According to the example analysis of malware samplesand the description of related literature we summarize avariety of malware behaviors and give the behavior set 119864

119864amp 120593right gain 120593access 120593store 120593encrypt 120593monitor 120593intercept

120593connect 120593transmit 120593send 120593dial 120593decode 120593install 120593popup

120593tamper 120593delete 120593hide 120593remote control

(4)

Taking the monitoring and interception of broadcastmessage as an example explains how behavior is describedBehavior can be described as a mapping relationship

(i) Broadcast message monitoring

120593monitor Broadcast info 997888rarr Broadcast info1015840

Broadcast info = (Broadcast info is monitoredNo)

Broadcast info1015840

= (Broadcast info is monitoredYes)

(5)

(ii) Broadcast message interception

120593intercept Broadcast info1015840 997888rarr Broadcast info10158401015840

Broadcast info1015840

= (Broadcast info is monitoredYes)

Broadcast info10158401015840

= (Broadcast info is interceptedYes)

(6)

Behavior is not independent and irrelevant there isalways a relation between the outputs of one behaviorcorresponding to the inputs of another behavior One ofthe most common relationships is data dependence that

is the execution input of the later behavior needs theexecution output of the previous behavior Another commonrelationship is control dependence although two behaviorshave no direct data flow relationship the execution of thelatter behaviors needs the execution of the previous one asa trigger condition In former example the relation betweenmonitor and intercept is data dependence relation and can bedescribed as follows

Output (120593monitor) 997888rarr Input (120593intercept)

Broadcast info 997888rarr Broadcast info1015840

997888rarr Broadcast info10158401015840

(7)

Definition 4 Software purpose is the final state of all inputobjects that act in an intention formally represented as

Goal = (object1 attribute attri value) times sdot sdot sdot

times (object119899 attribute attri value) (8)

In purpose representation all input objectsobject1 sdot sdot sdot object119899 represent the objects involved in theintention For example after the extraction of sensitivebehaviors in a malware and having a formal semanticrepresentation we use the ontology reasoning engine toautomate the reasoningWe can eventually get the descriptionof the relationship between behaviors and the final statesof all the objects For instance we use ldquo(SMS positionldquo1062568rdquo)rdquo to illustrate that the SMS is sent to ldquo1062568rdquo(URL is used Yes) represents the fact that the URL is usedin the related function and (DeviceId is encrypt Yes)illustrates that the DeviceId has been encrypted

33 Proof of Model If intention is viewed as a system [22]the basic component of intention is therefore the behaviorAfter the inputs and outputs involved in these behaviors aredefined a series of basic mapping relationships are obtainedwhich reflects the influence on external object of the basicbehavior The overall nature of an intention system can bedetermined by a set of mappings between a set of inputs anda set of outputs Lemmas 5 and 6 are the basic mathematicaltheory of our behavior deduction

Lemma 5 (function combination) The union of mapping set120593119894 is a mapping 120593 119860 rarr 119861

120593 =119899

⋃119894=1

120593119894

119860 =119899

⋃119894=1

119860 119894

Journal of Electrical and Computer Engineering 5

119861 =119899

⋃119894=1

119861119894

120593119894 119860 119894 rarr 119861119894 1 le 119894 le 119899(9)

when (forall119860 119894)(forall119860119895)(119860 119894 sube 119860 and 119860119895 sube 119860 and 119860 119894 = 119860119895 rarr 119860 119894 cap 119860119895 =)

Lemma 6 (function compound) Two-tuple 120593 consisted of aset of mappings Φ and its compound relation 120585∘ is a mapping120593 dom(1205931) = ran(120593119899)

120593 = (Φ 120585∘)

Φ = 120593119894 | 120593119894 119860 119894 rarr 119861119894 119860 119894 = dom (120593119894) 119861119894

= ran (120593119894) 1 le 119894 le 119899

120585∘ = 120593119894+1 ∘ 120593119894 | 120593119894 120593119894+1 isin Φ 1 le 119894 le 119899 minus 1

(10)

when (forall119894)((1 le 119894 le 119899 minus 1) rarr ran(120593119894) sube dom(120593119894+1))

According to Lemmas 5 and 6 we proved Corollaries 7and 8

Corollary 7 (compound intention deduction) 1205931 1205932 120593119899 represents a set of behaviors that are involved in an inten-tion If the relation between these behaviors is 119874119906119905119901119906119905(120593119894) =119868119899119901119906119905(120593119894+1) 1 le 119894 le 119899 minus 1 the output 119874119906119905119901119906119905(120593119899)of behavior 120593119899 is therefore the representation of final goalbehavior sequence is denoted as 120585119894 = (1205931 1205932 120593119899)

Proof 1205931 1205932 120593119899 are a set of behaviors By the definitionof 120593 behavior satisfies the mapping relation between objectsand can be regarded as function mapping 120593119894 and 120593119894+1 satisfythe former output corresponding to the latter input that isthe range of 120593119894 is equal to the domain of 120593119894+1 which satisfies(forall119894)((1 le 119894 le 119899 minus 1) rarr ran(120593119894) sube dom(120593119894+1)) According toLemma 6 Corollary is proved

Corollary 8 (combination-compound intention deduction)1205931 1205932 120593119899 represents a set of behaviors that are involvedin an intention If there is an input to output relation119874119906119905119901119906119905(120593119898) = 119868119899119901119906119905(120593119906) 1 le 119898 119906 le 119899 and parallelrelationship 119874119906119905119901119906119905(120593119894) cup 119874119906119905119901119906119905(120593119895) = 119868119899119901119906119905(120593119896) 1 le119894 119895 119896 le 119899 at the same time then the output 119874119906119905119901119906119905(120593119906) cup119874119906119905119901119906119905(120593119896) of 120593119906 and 120593119896 is the final goal Behavior sequencesare denoted as 120585119894 = (Φ 119877) Φ is behavior set 119877 is therelationship between these behaviors

Proof 1205931 1205932 120593119899 is known as a set of behaviors Bythe definition of 120593 behavior satisfies the mapping relationbetween objects and can be regarded as function mapping120593119898 and 120593119906 satisfy the formerrsquos output corresponding to thelatterrsquos input that is the range of 120593119898 is equal to the domain of120593119906 dom(120593119906) is the compound output according to Lemma 6120593119894 and 120593119895 satisfy both of their outputs as the input of 120593119896because the intersection of the inputs of 120593119894 and 120593119895 is empty(ran(120593119894) cup ran(120593119895) = ) they satisfy Lemma 5 120593119894 and 120593119895

satisfy combination relation which could be combined into anew mapping

1205931015840 ran (120593119894) cup ran (120593119895) 997888rarr dom (120593119894) cup dom (120593119895) (11)

Then 1205931015840 and 120593119896 satisfy the condition of Lemma 6 dom(120593119896)is the compound output Corollary is proved according toLemma 6

4 Ontology Inference System

This chapter elaborates the construction process of ontologyinference system Section 1 gives the construction specifica-tion of ontology model [10] Section 2 gives SWRL rules usedin inference engine Section 3 is the description of mappingmethods from Data Source to Fact Base Section 4 elaboratesthe framework of inference system

41 Ontology Model The reasons for using ontology inour inference system are as follows (1) The conceptualsystem based on ontology is computable and the automaticdeduction of intentions can be achieved by the computingbetween concepts The system has reduced the workload ofcode writing accordingly (2) The extensibility of ontologyenables us to extend the conceptual model at any timeaccording to the emergence of new features of malwareOnce new malware knowledge appears we only need tomodify the ontology model based on the knowledge Inthis way we have reduced the amount of code update andmaintenance (3) The ontology model of the elements andtheir relationships in malware behavior intention domaincan standardize domain knowledge and make it shareable[10] It can provide a standardized representation formalwareintention

The ontology model of malware behavior intention usesthe following knowledge the definition and classification ofbehavior and behavior object intention model and Corol-laries 7 and 8 The concepts and the relation are shownin Figure 1 The definitions of each concept are given inSection 3 We use Pellet engine reasoning to verify theconsistence of ontology

The definitions of object attributes and data attributes areillustrated in Table 2

42 SWRL Inference Rules This section uses the knowledgeof Corollaries 7 and 8 and Definition 4 in Section 3 Beforewriting inference rules we need to define the format of basicfacts The fact is a composition description of basic behaviorThere are two categories of basic facts (in our research) B1the behavior with single input and single output B2 thebehavior with double input and double output See Box 1

(1) Inference Rules of the Relationship between Behaviors

Premise 1 Any two B1 behaviors are B11 and B12 B11(in)represents the input of B11 and B11(out) represents its outputIf any two B1 behaviors B11 and B12 have the relation of B11rsquos

6 Journal of Electrical and Computer Engineering

Malware domain

right_gain

accessstore

encryptmonitor

intercept

connect

transmit

install_malware

popup_ad

tamper

dial

delete

hide_file

is a

is a

send

has

beh_objectbehavior

Android malware

obj_nameattribute

attri_value

hasInput

hasOutput

hasA

ttriva

lueha

sAttr

ibut

e hasNam

e

privacy

right

event

malware

parameter

config_file

decode

hasBehavior

Figure 1 Semantic ontology of behavior intention

Figure 2 SWRL inference rules of the relationship between behaviors and final goal

output object corresponding to B12rsquos object input Rule-1 isthe inference rule corresponding to this condition See Box 1

Premise 2 Any two B1 behaviors are B11 and B12 one B2behavior is B2 B11(in) represents the input to B11 andB11(out) represents its output B2(1in) B2(2in) B2(1out) andB2(2out) represent the input and output of B2 respectively

If the outputrsquos union of any two B1 behaviors is the inputof a certain B2 behavior and the input intersection of thesetwo B1 behaviors is empty then B11 and B12 have a compoundrelation with the B2 behavior respectively There is also acombination relation between B11 and B12The rule is B1-B2-Rule-1 shown in Figure 2 and the rest of the rules are similar

(2) Inference Rules of the Final GoalOn the basis of behaviorrsquosinference rules and Definition 4 the inference rules of finalgoal are summarized as in Figure 2 such as Goal-Rule-1 Theoutputs are the descriptions of objectrsquos final state Due to thelimited space other rules are no longer presented

43 Extraction of Behavior Facts The basic facts include B1and B2 behavior and the behaviorrsquos elements which shouldbe extracted from Data Source are as follows

(1) Behavior (name) extracting the behavior descriptionin a program according to the mapping between the

behaviors defined in our sensitive API database andsensitive APIs (or code segment) in program

(2) Input (behavior object) we determine objects de-scription based on the parameters of sensitive APIsand the official document definition

(3) Output (behavior object) object after a behavioroperates on it its object name and attribute willnot change while its attribute value will be affectedTherefore we can determine the changes in attributevalues based on this behaviorrsquos definition

The mapping from Data Source to behavior facts isdivided into two stages the first stage is behavior recognitionthe second includes object identification and relation analysisbetween objects Detailed process is as follows

First Stage Use reverse tool to decompile these malwaresamples and then generate its call graph (CG) and controlflow graph (CFG) The leaf nodes of the call graph aretraversed to find sensitive API and identify the correspondingbehavior according to themapping relation between behaviorand sensitive API calls Partial mapping relations betweenbehaviors and sensitive API calls and object descriptions areshown in Table 3

Second Stage For each identified behavior the behaviorrsquosobject is identified according to the usage of parameter in

Journal of Electrical and Computer Engineering 7

Table 2 Specification of object attributes and data attributes

Attribute name SpecificationhasCombinationwith Behaviors have combination relationhasCompoundwith Behaviors have compound relationhasInput(x y) The behaviorrsquos single input is ldquoyrdquohasOutput(x z) The behaviorrsquos single output is ldquozrdquohasFirstInput(x y1) The first input of double input is ldquoy1rdquohasSecondInput(x y2) The second input of double input is ldquoy2rdquohasFirstOutput(x z1) The first output of double input is ldquoz1rdquohasSecondOutput(x z2) The second output of double input is ldquoz2rdquohasBehavior(x y) The behavior belonging to malware is ldquoyrdquohasObjectName(x y) The name of the object is ldquoyrdquohasAttribute(x y) The attributes of an object are ldquoyrdquohasAttributeValue(x y) The objectrsquos attributes value is ldquoyrdquo

Rule-1 behavior B1(B11) and behavior B1(B12) and hasBehaviorName(B11 bn1)andhasBehaviorName (B12 bn2)and hasOutput(B11op) and hasInput(B12ip)and hasObjectname(op on1) and hasObjectname(ip on2)andswrlbequal(on1on2) and hasAttribute(op att1) and has Attribute(ip att2) and hasValue(att1 attri1) and hasValue(att2 attri2) and swrlbequal (attri1 attri2) and hasAttributeValue(opattv1) and hasAttributeValue (ip attv2) and hasValue(attv1 attriv1) and hasValue(attv2attriv2) and swrlbequal(attriv1attriv2)rarr hasCompoundwith(bn1 bn2)

Box 1

sensitive API (two categories the first obtains behavior objectbased on the class object and the definition of API such asgetDeviceId() the second obtains behavior object based onthe parameter usage of API and the definition of API such asRuntimeexec (RootExploitfile)) Data dependence betweenbehaviors is analyzed using FlowDroid [4]

44 Framework of Inference System The framework of inten-tion inference system is shown in Figure 3

We use the knowledge of intention definition and Corol-laries 7 and 8 to construct SWRL rules (the rules arerepresented in ontology language) Extracting Data Sourcefrom Android applications uses reverse engineering technol-ogy [15] The extraction methods of facts are illustrated inSection 43 Jess inference engine completes the reasoningprocess using the Facts Base and SWRL rules and givesthe inference output (the description of malware intention)finally

45 Motivation Example Take Zitmo [16] as an example toshow the facts extracted from these samples as shown inTable 4 Among these extracted facts access and transmit areB2 behavior B2rsquos double input indicates that the executionof the behavior needs to meet the condition of the two keyinputs

After the rules and facts are imported to Jess enginethe inference results are exported they are the formalized

descriptions of intentions of each malicious sample Theresults of Zitmorsquos behavior relations are shown in Figure 4Other results of intention reasoning are analyzed separatelyand show reasonable efficiency

We extract behavior relations and goal representationfrom Figure 4 which are shown in Table 5

Zitmorsquos intention includes behavior sequence (consistingof behavior set and behavior relation) and the representationof goal (output object description) To visually demonstratethe reasoning results we display them in Figure 5 graphi-cally In Figure 5 rectangle represents the behavior ellipserepresents the input or output object of the behavior andarrow indicates the relationship between object and behaviorThe ellipse with a black font represents the raw input ofobjects involved in the intention respectively as broadcastinformation text messages and URL addresses These finalstates are as follows broadcast message is monitored andintercepted the contents of the messages are transmitted toremote server and the URL is used as the destination addressof the transmission They represent the goal of the intentiontogether

5 Evaluation

We have implemented the ontology inference systemdescribed in Section 44 in a prototype inference system foridentifying and describing malware intention in behavior

8 Journal of Electrical and Computer Engineering

Table 3 Partial mappings between behavior and sensitive APIs

Behavior Behavior object sensitiveAPIright gain Root permission Runtimeexec()

access

Device ID getDeviceId()Carrier name getNetworkOperatorName()Phone position getCellLocation()Short Message createFromPdu()

send Number and contents sendTextMessage()intercept Broadcast information abortBroadcast()connect URL-parameter URLConnectionconnect()transmit Parameter execute()encrypt Parameter setEntity()store Parameter writeRec()

Act Obj Act

ActObj

Obj

Data source

Inference engine

SWRL rules

Corollaries7 and 8

Ontology

Inference output(intention description)

Malware

Facts base

1

2Intention

definition

Figure 3 The framework of intention inference system

Figure 4 Inference results of Zitmorsquos behavior relations

Broadcastis_monitored No monitor Broadcast

is_monitored Yes interceptBroadcast

is_ intercepted Yes

Beh_object SmsMessageposition inDevice access SmsMessage

position inMemory encrypt SmsMessageis_plain No

Input or Output URLis_used No connect

Behavior

URLis_used Yes transmit

SmsMessageposition remoteServer

Figure 5 Visualization of Zitmorsquos inference results

Journal of Electrical and Computer Engineering 9

Table 4 Behavior facts extracted from Zitmo

Behavior Code segment or function Behavior objectmonitor onReceive() Broadcastintercept abortBroadcast() Broadcastrsquo

accesscreateFrompdu()getMessageBody()

getOriginatingAddress()

BroadcastrsquoSmsMessage

connect new HttpPost(URL) URLencrypt setEntity() SmsMessage

transmit execute() URLrsquoSmsMessage

Table 5 The inference results of Zitmo

Behavior relationship Final goal representation

hasCompoundwith(monitor intercept) (Broadcast is monitored Yes)(Broadcast is intercepted Yes)

hasCompoundwith(monitor access) (SmsMessage positioninMemory)

hasCompoundwith(access encrypt) (SmsMessage is plain No)hasCombinationwith(connect encrypt) NohasCompoundwith(encrypt transmit) (URL is used Yes)

hasCompoundwith(connect transmit) (SmsMessage positionremoteServer)

facts In this section we evaluate the following aspectsFirst we verify whether the sensitive API library we buildcan cover the behavior facts that exist in malware samplesand give the coverage rate analysis Second to verifythe effectiveness of artificial extraction (semiautomated)behavior facts we compare our manual analysis resultswith the results generated by automated tool CopperDroid[19] Thirdly the effectiveness and reasoning performanceof the ontology inference system are shown Finally thecorrectness readability and effectiveness of the results ofintention reasoning are tackled

We have selected 75 typical malware samples (1 real-world ransomware sample and 9 real-world samples fromGenomersquos 1260 samples and 65 DroidBench samples) forevaluation which are shown in Table 6 Among these 10real-world samples 9 samples are collected from Androidmalware database established by Zhou and Jiang [23] inNorth Carolina State University We use these real-worldsamples because they are typical and representative of theirfamily Other samples in Genome are in same family sothe vulnerabilities and threats in the program are similarFor efficiency consideration we do not add the remainingsamples to experimental sample set DroidBench [4] samplesare designed to assess the correctness of static analyseson Android apps We use these samples as the groundtruths because they have open-sourced programs with clearsemantics [20] Thus we can make an adequate analysis ofthese samples In fact static analysis technique in generaloften lacks capability of extracting runtime behaviors and canbe evaded accordingly but it has incomparable advantagesover dynamic technology Nevertheless this paper focuses

on the research of software intention reduction that is theemphasis is to study the extraction and semantic mappingmethods of malwarersquos behavior the extraction of relationshipbetween behaviors and the representation of malwarersquos finalgoal

51 Effectiveness of Behavior Facts Extraction In generalwe have discovered 102 significant behaviors involving 115sensitive APIs via sensitive APImining inGenomersquos Androidapps and the summary of related literature [15 19ndash21] Wealso refer to Androidrsquos official API document and based onthe information collected a database of sensitiveAPI behavioris established These APIs are controlled by 30 Androidsensitive privileges and have detailed descriptions in officialdocuments

To evaluate the effectiveness and validity of behaviorfacts extracting we compare the description of our behaviorfacts extraction and CopperDroidrsquos [19] behavior descriptionCopperDroid is an automatic VMI-based dynamic analysissystem to reconstruct the behaviors of Android malware thenovelty of CopperDroid lies in its agnostic approach to iden-tify interesting OS- and high-level Android-specific behav-iors We perform a user study on CopperDroid platform andour extraction methods the goal is twofold First we give acomparative analysis on the behavior coverage rate of thesesamples Second we hope to know whether the behaviordescription generated by our methods is readable to averageaudience To this end we compare the behavior descriptionof our methods and CopperDroidrsquos Public Reports [24] Wehave collected 150 copies of CopperDroidrsquos public malwaresample analysis report and make a statistical analysis As

10 Journal of Electrical and Computer Engineering

Table6Malwares

amples

andtheirb

ehaviorinformation

Samplefam

ilyNum

ber

Ours(Cop

per)

Major

behavior

Intentioncla

ssificatio

nZitm

o1

6(2)

mon

itorinterceptaccesscon

nectencrypttransmit

Privacyste

aling

GoldD

ream

19(4)

mon

itoraccessstorecon

necttransmit

Privacyste

aling

DroidDream

15(3)

right

gainaccessconn

ecttransm

itPrivacyste

aling

DroidDelu

xe1

3(2)

mon

itorrig

htgainaccessconn

ecttransm

itPrivacyste

aling

Hippo

SMS

14(2)

send

mon

itoraccessintercept

Tariff

consum

ption

Geinimi

16(3)

remotecontrolsend

con

necttransmit

Tariff

consum

ption

RogueSPP

ush

16(2)

send

mon

itoraccessinterceptdelete

Malicious

chargeback

GGTracker

19(4)

accesscon

necttransmit

storeencryptm

onito

rsend

intercept

Malicious

chargeback

DroidKu

ngFu

-Upd

ate

14(2)

conn

ecttransm

itinsta

llmalpop

upMalwarep

ropagatio

nLo

vebu

ckleword

14(1)

popu

ptamperconn

ecttransm

itrig

htgain

Extortionuser

Aliasin

g1

2(2)

accesssend

Privacyleak

And

roidSpecific

93(2)

accesslogging

send

Privacyleak

ArraysA

ndLists

72(2)

accesssend

Privacyleak

Callb

acks

42(2)

accesssend

Privacyleak

EmulatorDetectio

n3

3(2)

accesslogging

send

Privacyleak

FieldA

ndObjectSensitivity

33(2)

accesslogging

send

Privacyleak

GeneralJava

143(2)

accesslogging

send

Privacyleak

ImplicitF

lows

42(2)

accesslogging

Privacyleak

InterA

ppCom

mun

ication

32(2)

accesssend

Privacyleak

Lifecycle

114(2)

accesscon

nectsendlogging

Privacyleak

Reflection

42(2)

accesssend

Privacyleak

Threading

23(2)

accesslogging

send

Privacyleak

Journal of Electrical and Computer Engineering 11

Table 7 Behavior semantics between ours and CopperDroid

Behavior class Objects Copper Ours Similar

Access personal infoSMS phoneaccountslocation

4 18 90

FS access Open Write 2 2 100

Network access Generic HTTPDNS 3 3 100

Send SMS SMS 1 1 100

Execute external file Generic Priv EscShell Inst APK 6 8 75

Encrypt info File info 0 2 0Intercept SMS SMS 0 2 0Store info Privacy info 0 1 0

Table 8 Intention generation results for samples

Sample set False status Missing desc Correct 119879Genome 1 1 8 10DroidBench 4 2 59 65Total 5 3 67 75

shown in Table 6 the third column (oursCopper) indicatesthe number of behaviorsrsquo extraction using our methodsand the number of behaviors extracted by CopperDroidfor the same sample In terms of the number of extractedbehaviors our extractionmechanism (87) is 776more thanCopperDroidrsquos behavior number (49)

We further study whether the behavior description gen-erated by our methods is readable to average audience Asbehavior facts extraction is the basis of this studywe thus pro-duce 75 behavior descriptions reports for these 75 samplesThe result illustrated in Table 7 depicts that the descriptionsderived from ours are richer than CopperDroid while theratio of behavior similarity ranges from75 to 100Copper-Droid failed to extract some of the behaviors we extract suchas encrypt intercept and store personal info Thus we arguethat the behaviors we extract can reflect the common pro-gram semantics and programming conventions effectively

Notice that though it may take artificial effort to generatebehavior facts description behavior extraction is a technol-ogy that has already been realized We refer to the work ofpredecessors extractingmalware behaviors though automaticextraction technology is not somatureThe focus of this workis to propose a framework for the description and deductionof behavioral intentions We argue that any analysis toolsboth static and dynamic can be utilized in our frameworkto achieve the described malware intention

52 Validity of Ontology Inference System Next we evaluatethe validity of ontology inference system Ontology inferenceexperiment reveals that the workload of manual writing codereduced significantly and Pellet engine reasoning verifiedthat the ontology is complete and consistent We evaluatethe runtime performance and ontology validity for 75 sam-ples preparation for behavior facts (add ontology instance)

dominates the runtime while the intention results generationis usually fairly fast (under 200 milliseconds) The averageinference runtime is 2 seconds while the analysis for amajority (85) of apps can be completed within 5 secondsThe validation of the reasoning results indicates that 893 ofthe samplersquos reasoning results are correct and the intentionresults of graphical display show high readability Table 9shows the readability ratings of 75 apps

53 Correctness Readability and Effectiveness

Correctness To evaluate the correctness we produce inten-tion descriptions for 75 malware sample apps (10 typical real-world samples and 65 DroidBench samples) using our behav-ior facts extraction algorithm and ontology inference engineFirst we use artificial methods to simulate automation toolsto extract the behavior facts in these samples The extractionmechanism is to construct the environment dependencegraph first and then use themethods in Section 43 tomap thesensitive APIs to behavior facts Second the extracted behav-ior facts are instantiated in the tools provided by protege347 and then the instances that are obviously wrong arefiltered Finally we use Jess inference engine for reasoning thebehavior facts and the SWRL rules are imported to inferenceengine at the same timeThe inference results give the outputof behavior sequence and final goal representation (see thedefinition of malware intention) finally the framework isshown in Figure 3

Results Table 8 presents the experimental results which showthat our inference system achieves a true positive rate of893 We must reiterate that the basis of our reasoning isthe sensitive API database we built can fully map sensitivebehaviors in malware Although it is not perfect it can be

12 Journal of Electrical and Computer Engineering

Table 9 Precision results for Appscan and intention reports

Judgment Appscan report Intention descriptiongeno Driod All geno Droid All

(1) Ineffective 5 0 5 1 2 3(2) Uncertain 1 65 66 1 4 5(3) Effective 4 0 4 8 59 67Total 10 65 75 10 65 75

fully covered in the 75 selected samples Inference systemmisses intention descriptions due to two major reasons(1) Artificial behavior facts extraction lacks accuracy Werely on our environment dependence graph and extractionalgorithm to perform simulation analysis However it is notprecise enough to handle the data flow propagation thatcannot be solved by existing static analysis technology (suchas exception handler code reflective calls) (2)The programlogic in real-world malware is more complex and withsemantic ambiguity and there are too many independentsensitive behaviors (in the form of user defined functions)This is not the result of the lack of theory and only technicalproblem

In fact static analysis in general lacks the capability ofextracting runtime behaviors and can be evaded accordinglyNevertheless we argue that our focus is to construct aframeworkwhich produces description and derivationmodelof malware intention

Readability and Effectiveness To evaluate the readability andeffectiveness of inference output (Intention) we perform auser study on our malware intention inference results Thegoal is twofold First we hope to knowwhether the generatedintentions are readable to average audience Second weexpect to see whether our intention descriptions can actuallyhelp users avoid risky apps

Methodology 360 microscopesrsquo Appscan platform [25] is anonline security scanning systemwhich focuses on the securityscan of Android apps The system will provide a scan reportafter the scanning and describe possible malicious intentof the app to users We uploaded all the samples to 360microscopesrsquo Appscan platform and got 75 security reportsof these samples We also collate ontology inference resultsand show them in the form of formalization and graphics75 samplesrsquo intention reports are produced We use both75 Appscan reports and 75 intention reports to measureuser reaction and the number of vulnerabilities providedby 360 microscopesrsquo Appscan platform is compared withthe intention description Furthermore due to the objectiveevaluation consideration we perform the user study basedon the official intents descriptions of 75 apps We use theseofficial descriptions as a reference standard

We have recruited participant directly from our lab-oratory group and we require participant that must besmartphone users We also make sure that participants haveexperiences with Android malware analysis and understandbasic smartphone events such as ldquointercept SMSrdquo or ldquoaccess

GPS locationrdquo We provide a rating for each sample report(Appscan and intention) with respect to its effectiveness Therating ranges from 1 to 3 where 1 means ineffective 2 meansuncertain and 3 means effective as shown in Table 9

Results and Implications Eventually the results are shownin Table 9 360 microscopesrsquo Appscan platform provides acoarser grained description and users can only judge the trueintents of 4 samples according to the Appscan report Whileusers can judge the malice of 67 samples using intentiondescription report This indicates our intention report isreadable even compared to Appscan report created by 360microscopesrsquo Appscan platform The results also reveal thatthe readability and granularity of intention description arerelatively important while Appscan-generated ones some-times confuse users 66 samples cannot be judged correctlythrough Appscan report In a further investigation we noticethat Appscan report provides a coarse intent descriptionof the sample while our security centered intention reportprovides a fairly fine-grained display of intention We believethat this can be further utilized during threat analysis

Threat Analysis Analysts are able to understand the relation-ships between malware behaviors (including trigger processof behaviors data transfer relation and the change in objectcharacteristics) according to the intention description Thisis of great practical significance for users to understand theimplementation mechanism of malware evaluate its possiblelosses and provide preventive solutions

6 Conclusion

In this paper a novel description and derivation model ofAndroid malware intention based on the theory of intentionand malware reverse engineering is proposed to restore theintention of malware and automate the process of intentionreasoning process In order to standardize the new concep-tual system and automate the intent derivation we createontology of malware intention As ontology is computablewe import SWRL rules represented in ontology languageand Fact Base to Jess engine and Jessrsquos outputs are thedescription of malware intention We evaluate our systemusing 75 typical malware samples (5 kinds) Experimentsshow that our inference system is capable of achieving thedesired results of intention description which proved itsfeasibility and effectiveness In addition by using existingreverse technology and data flow analysis tools we can extractbehavioral facts effectively

Journal of Electrical and Computer Engineering 13

Methods of mapping from Data Source to facts aredivided into two forms artificial extraction and automaticextraction We mainly use the methods of artificial factsextraction There are many opportunities for future workssuch as the implementation of automatic facts extractionmethods and the automatic visualization of the reasoningresults

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work was partly supported by Beijing Key Laboratory ofInternet Culture and Digital Dissemination Research Project(no ICDDXN001) National Key Technology Research andDevelopment Program of the Ministry of Science and Tech-nology of China (no 2015BAK12B03-03) Special Projectof Central Government to Guide the Development ofLocal Science and Technology (no Z171100004717002)and National Natural Science Foundation of China (nos61370065 61502040)

References

[1] P Guojun L Jingwen and S Runkang ldquoResearch and Progressof Android malware detectionrdquo Journal of Wuhan University(Science Edition) vol 61 no 1 pp 21ndash33 2015

[2] A Kharraz S Arshad C Mulliner et al ldquoA Large-ScaleAutomated Approach to Detectingrdquo in Proceedings of the 25thUSENIX Security Symposium (USENIX Security 16) pp 757ndash772 USENIX Association 2016

[3] M-Y Su and K-T Fung ldquoDetection of android malwareby static analysis on permissions and sensitive functionsrdquo inProceedings of the 8th International Conference on Ubiquitousand Future Networks ICUFN 2016 pp 873ndash875 Austria July2016

[4] S Arzt S Rasthofer C Fritz et al ldquoFlowDroid precise contextflow field object-sensitive and lifecycle-aware taint analysis forAndroid appsrdquoACM Sigplan Notices vol 49 no 6 pp 259ndash2692014

[5] E Bratman Michael ldquoWhat is intentionrdquo in Intentions incommunication P R Cohen Ed pp 15ndash32 theMITPress 1990

[6] J Shirley and D Evans ldquoThe user is not the enemy Fightingmalware by tracking user intentionsrdquo in Proceedings of the NewSecurity ParadigmsWorkshop 2008 NSPW rsquo08 pp 33ndash45 USASeptember 2008

[7] C-T Jung C-H Sun and M Yuan ldquoAn ontology-enabledframework for a geospatial problem-solving environmentrdquoComputers Environment and Urban Systems vol 38 no 1 pp45ndash57 2013

[8] I Horrocks F P Patel-Schneider H Boley et al SWRL ASemantic Web Rule Language Combining OWL and RuleMLW3C Member Submission 2004

[9] J Bak C Jedrzejek andM Falkowski ldquoUsage of the jess enginerules and ontology to query a relational databaserdquo LectureNotes in Computer Science (including subseries Lecture Notes

in Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 5858 pp 216ndash230 2009

[10] X-Y Du M Li and S Wang ldquoSurvey on ontology learningresearchrdquo Ruan Jian Xue BaoJournal of Software vol 17 no 9pp 1837ndash1847 2006

[11] S Cheng S Luo Z Li et al ldquoStatic Detection of DangerousBehaviors in Android Appsrdquo in Proceedings of the the 5thinternational symposium on cyberspace safety and securitySpringer International Publishing 2013

[12] IDA pro httpwwwhex-rayscomproductsida[13] J Jian and X Qing ldquoOn case auto generation using reverse

analysis for Android malwarerdquo Journal of HeFei University ofTechnology (Natural Science Edition) vol 39 no 4 pp 466ndash4702016

[14] W Yang X Xiao B Andow S Li T Xie and W EnckldquoAppContext differentiating malicious and benign mobile appbehaviors using contextrdquo in Proceedings of the 37th IEEEACMInternational Conference on Software Engineering (ICSE rsquo15) vol1 pp 303ndash313 IEEE May 2015

[15] Y Li T Shen X Sun et al ldquoDetection Classification andCharacterization of Android Malware Using API Data De-pendencyrdquo in Proceedings of the International Conference onSecurity and Privacy in Communication Systems vol 164 pp23ndash40 Springer International Publishing 2015

[16] M Zhang Y Duan H Yin and Z Zhao ldquoSemantics-awareAndroid malware classification using weighted contextual APIdependency graphsrdquo in Proceedings of the 21st ACM Conferenceon Computer and Communications Security (CCS rsquo14) pp 1105ndash1116 ACM Scottsdale Ariz USA November 2014

[17] P Wu H Changzhen and Y Shuping ldquoA dynamic intrusiveintention recognition method based on timed automatardquo Jour-nal of Computer Research and Development vol 48 no 7 pp1288ndash1297 2011

[18] J-H Han S-J Lee and J-H Kim ldquoBehavior Hierarchy-BasedAffordance Map for Recognition of Human Intention and ItsApplication to Human-Robot Interactionrdquo IEEE Transactionson Human-Machine Systems vol 46 no 5 pp 708ndash722 2016

[19] K Tam S J Khan A Fattori et al ldquoCopperDroid automaticreconstruction of android malware behaviorsrdquo in Proceedingsof the Network and Distributed System Security Symposium SanDiego Calif USA 2015

[20] M Zhang and H Yin ldquoAutomatic Generation of Security-Centric Descriptions for Android Appsrdquo inAndroid ApplicationSecurity Springer International Publishing 2016

[21] Z Wang C Li Z Yuan Y Guan and Y Xue ldquoDroidChain Anovel Android malware detection method based on behaviorchainsrdquo Pervasive andMobile Computing vol 32 pp 3ndash14 2016

[22] X Guozhi System science and Engineering Research Shanghaiscience and Technology Education Press 2001

[23] Y Zhou and X Jiang ldquoDissecting android malware char-acterization and evolutionrdquo in Proceedings of the 33rd IEEESymposium on Security and Privacy pp 95ndash109 IEEE 2012

[24] httpcopperdroidisgrhulacukcopperdroidindexphp[25] httpappscan360cn

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

Journal of Electrical and Computer Engineering 5

119861 =119899

⋃119894=1

119861119894

120593119894 119860 119894 rarr 119861119894 1 le 119894 le 119899(9)

when (forall119860 119894)(forall119860119895)(119860 119894 sube 119860 and 119860119895 sube 119860 and 119860 119894 = 119860119895 rarr 119860 119894 cap 119860119895 =)

Lemma 6 (function compound) Two-tuple 120593 consisted of aset of mappings Φ and its compound relation 120585∘ is a mapping120593 dom(1205931) = ran(120593119899)

120593 = (Φ 120585∘)

Φ = 120593119894 | 120593119894 119860 119894 rarr 119861119894 119860 119894 = dom (120593119894) 119861119894

= ran (120593119894) 1 le 119894 le 119899

120585∘ = 120593119894+1 ∘ 120593119894 | 120593119894 120593119894+1 isin Φ 1 le 119894 le 119899 minus 1

(10)

when (forall119894)((1 le 119894 le 119899 minus 1) rarr ran(120593119894) sube dom(120593119894+1))

According to Lemmas 5 and 6 we proved Corollaries 7and 8

Corollary 7 (compound intention deduction) 1205931 1205932 120593119899 represents a set of behaviors that are involved in an inten-tion If the relation between these behaviors is 119874119906119905119901119906119905(120593119894) =119868119899119901119906119905(120593119894+1) 1 le 119894 le 119899 minus 1 the output 119874119906119905119901119906119905(120593119899)of behavior 120593119899 is therefore the representation of final goalbehavior sequence is denoted as 120585119894 = (1205931 1205932 120593119899)

Proof 1205931 1205932 120593119899 are a set of behaviors By the definitionof 120593 behavior satisfies the mapping relation between objectsand can be regarded as function mapping 120593119894 and 120593119894+1 satisfythe former output corresponding to the latter input that isthe range of 120593119894 is equal to the domain of 120593119894+1 which satisfies(forall119894)((1 le 119894 le 119899 minus 1) rarr ran(120593119894) sube dom(120593119894+1)) According toLemma 6 Corollary is proved

Corollary 8 (combination-compound intention deduction)1205931 1205932 120593119899 represents a set of behaviors that are involvedin an intention If there is an input to output relation119874119906119905119901119906119905(120593119898) = 119868119899119901119906119905(120593119906) 1 le 119898 119906 le 119899 and parallelrelationship 119874119906119905119901119906119905(120593119894) cup 119874119906119905119901119906119905(120593119895) = 119868119899119901119906119905(120593119896) 1 le119894 119895 119896 le 119899 at the same time then the output 119874119906119905119901119906119905(120593119906) cup119874119906119905119901119906119905(120593119896) of 120593119906 and 120593119896 is the final goal Behavior sequencesare denoted as 120585119894 = (Φ 119877) Φ is behavior set 119877 is therelationship between these behaviors

Proof 1205931 1205932 120593119899 is known as a set of behaviors Bythe definition of 120593 behavior satisfies the mapping relationbetween objects and can be regarded as function mapping120593119898 and 120593119906 satisfy the formerrsquos output corresponding to thelatterrsquos input that is the range of 120593119898 is equal to the domain of120593119906 dom(120593119906) is the compound output according to Lemma 6120593119894 and 120593119895 satisfy both of their outputs as the input of 120593119896because the intersection of the inputs of 120593119894 and 120593119895 is empty(ran(120593119894) cup ran(120593119895) = ) they satisfy Lemma 5 120593119894 and 120593119895

satisfy combination relation which could be combined into anew mapping

1205931015840 ran (120593119894) cup ran (120593119895) 997888rarr dom (120593119894) cup dom (120593119895) (11)

Then 1205931015840 and 120593119896 satisfy the condition of Lemma 6 dom(120593119896)is the compound output Corollary is proved according toLemma 6

4 Ontology Inference System

This chapter elaborates the construction process of ontologyinference system Section 1 gives the construction specifica-tion of ontology model [10] Section 2 gives SWRL rules usedin inference engine Section 3 is the description of mappingmethods from Data Source to Fact Base Section 4 elaboratesthe framework of inference system

41 Ontology Model The reasons for using ontology inour inference system are as follows (1) The conceptualsystem based on ontology is computable and the automaticdeduction of intentions can be achieved by the computingbetween concepts The system has reduced the workload ofcode writing accordingly (2) The extensibility of ontologyenables us to extend the conceptual model at any timeaccording to the emergence of new features of malwareOnce new malware knowledge appears we only need tomodify the ontology model based on the knowledge Inthis way we have reduced the amount of code update andmaintenance (3) The ontology model of the elements andtheir relationships in malware behavior intention domaincan standardize domain knowledge and make it shareable[10] It can provide a standardized representation formalwareintention

The ontology model of malware behavior intention usesthe following knowledge the definition and classification ofbehavior and behavior object intention model and Corol-laries 7 and 8 The concepts and the relation are shownin Figure 1 The definitions of each concept are given inSection 3 We use Pellet engine reasoning to verify theconsistence of ontology

The definitions of object attributes and data attributes areillustrated in Table 2

42 SWRL Inference Rules This section uses the knowledgeof Corollaries 7 and 8 and Definition 4 in Section 3 Beforewriting inference rules we need to define the format of basicfacts The fact is a composition description of basic behaviorThere are two categories of basic facts (in our research) B1the behavior with single input and single output B2 thebehavior with double input and double output See Box 1

(1) Inference Rules of the Relationship between Behaviors

Premise 1 Any two B1 behaviors are B11 and B12 B11(in)represents the input of B11 and B11(out) represents its outputIf any two B1 behaviors B11 and B12 have the relation of B11rsquos

6 Journal of Electrical and Computer Engineering

Malware domain

right_gain

accessstore

encryptmonitor

intercept

connect

transmit

install_malware

popup_ad

tamper

dial

delete

hide_file

is a

is a

send

has

beh_objectbehavior

Android malware

obj_nameattribute

attri_value

hasInput

hasOutput

hasA

ttriva

lueha

sAttr

ibut

e hasNam

e

privacy

right

event

malware

parameter

config_file

decode

hasBehavior

Figure 1 Semantic ontology of behavior intention

Figure 2 SWRL inference rules of the relationship between behaviors and final goal

output object corresponding to B12rsquos object input Rule-1 isthe inference rule corresponding to this condition See Box 1

Premise 2 Any two B1 behaviors are B11 and B12 one B2behavior is B2 B11(in) represents the input to B11 andB11(out) represents its output B2(1in) B2(2in) B2(1out) andB2(2out) represent the input and output of B2 respectively

If the outputrsquos union of any two B1 behaviors is the inputof a certain B2 behavior and the input intersection of thesetwo B1 behaviors is empty then B11 and B12 have a compoundrelation with the B2 behavior respectively There is also acombination relation between B11 and B12The rule is B1-B2-Rule-1 shown in Figure 2 and the rest of the rules are similar

(2) Inference Rules of the Final GoalOn the basis of behaviorrsquosinference rules and Definition 4 the inference rules of finalgoal are summarized as in Figure 2 such as Goal-Rule-1 Theoutputs are the descriptions of objectrsquos final state Due to thelimited space other rules are no longer presented

43 Extraction of Behavior Facts The basic facts include B1and B2 behavior and the behaviorrsquos elements which shouldbe extracted from Data Source are as follows

(1) Behavior (name) extracting the behavior descriptionin a program according to the mapping between the

behaviors defined in our sensitive API database andsensitive APIs (or code segment) in program

(2) Input (behavior object) we determine objects de-scription based on the parameters of sensitive APIsand the official document definition

(3) Output (behavior object) object after a behavioroperates on it its object name and attribute willnot change while its attribute value will be affectedTherefore we can determine the changes in attributevalues based on this behaviorrsquos definition

The mapping from Data Source to behavior facts isdivided into two stages the first stage is behavior recognitionthe second includes object identification and relation analysisbetween objects Detailed process is as follows

First Stage Use reverse tool to decompile these malwaresamples and then generate its call graph (CG) and controlflow graph (CFG) The leaf nodes of the call graph aretraversed to find sensitive API and identify the correspondingbehavior according to themapping relation between behaviorand sensitive API calls Partial mapping relations betweenbehaviors and sensitive API calls and object descriptions areshown in Table 3

Second Stage For each identified behavior the behaviorrsquosobject is identified according to the usage of parameter in

Journal of Electrical and Computer Engineering 7

Table 2 Specification of object attributes and data attributes

Attribute name SpecificationhasCombinationwith Behaviors have combination relationhasCompoundwith Behaviors have compound relationhasInput(x y) The behaviorrsquos single input is ldquoyrdquohasOutput(x z) The behaviorrsquos single output is ldquozrdquohasFirstInput(x y1) The first input of double input is ldquoy1rdquohasSecondInput(x y2) The second input of double input is ldquoy2rdquohasFirstOutput(x z1) The first output of double input is ldquoz1rdquohasSecondOutput(x z2) The second output of double input is ldquoz2rdquohasBehavior(x y) The behavior belonging to malware is ldquoyrdquohasObjectName(x y) The name of the object is ldquoyrdquohasAttribute(x y) The attributes of an object are ldquoyrdquohasAttributeValue(x y) The objectrsquos attributes value is ldquoyrdquo

Rule-1 behavior B1(B11) and behavior B1(B12) and hasBehaviorName(B11 bn1)andhasBehaviorName (B12 bn2)and hasOutput(B11op) and hasInput(B12ip)and hasObjectname(op on1) and hasObjectname(ip on2)andswrlbequal(on1on2) and hasAttribute(op att1) and has Attribute(ip att2) and hasValue(att1 attri1) and hasValue(att2 attri2) and swrlbequal (attri1 attri2) and hasAttributeValue(opattv1) and hasAttributeValue (ip attv2) and hasValue(attv1 attriv1) and hasValue(attv2attriv2) and swrlbequal(attriv1attriv2)rarr hasCompoundwith(bn1 bn2)

Box 1

sensitive API (two categories the first obtains behavior objectbased on the class object and the definition of API such asgetDeviceId() the second obtains behavior object based onthe parameter usage of API and the definition of API such asRuntimeexec (RootExploitfile)) Data dependence betweenbehaviors is analyzed using FlowDroid [4]

44 Framework of Inference System The framework of inten-tion inference system is shown in Figure 3

We use the knowledge of intention definition and Corol-laries 7 and 8 to construct SWRL rules (the rules arerepresented in ontology language) Extracting Data Sourcefrom Android applications uses reverse engineering technol-ogy [15] The extraction methods of facts are illustrated inSection 43 Jess inference engine completes the reasoningprocess using the Facts Base and SWRL rules and givesthe inference output (the description of malware intention)finally

45 Motivation Example Take Zitmo [16] as an example toshow the facts extracted from these samples as shown inTable 4 Among these extracted facts access and transmit areB2 behavior B2rsquos double input indicates that the executionof the behavior needs to meet the condition of the two keyinputs

After the rules and facts are imported to Jess enginethe inference results are exported they are the formalized

descriptions of intentions of each malicious sample Theresults of Zitmorsquos behavior relations are shown in Figure 4Other results of intention reasoning are analyzed separatelyand show reasonable efficiency

We extract behavior relations and goal representationfrom Figure 4 which are shown in Table 5

Zitmorsquos intention includes behavior sequence (consistingof behavior set and behavior relation) and the representationof goal (output object description) To visually demonstratethe reasoning results we display them in Figure 5 graphi-cally In Figure 5 rectangle represents the behavior ellipserepresents the input or output object of the behavior andarrow indicates the relationship between object and behaviorThe ellipse with a black font represents the raw input ofobjects involved in the intention respectively as broadcastinformation text messages and URL addresses These finalstates are as follows broadcast message is monitored andintercepted the contents of the messages are transmitted toremote server and the URL is used as the destination addressof the transmission They represent the goal of the intentiontogether

5 Evaluation

We have implemented the ontology inference systemdescribed in Section 44 in a prototype inference system foridentifying and describing malware intention in behavior

8 Journal of Electrical and Computer Engineering

Table 3 Partial mappings between behavior and sensitive APIs

Behavior Behavior object sensitiveAPIright gain Root permission Runtimeexec()

access

Device ID getDeviceId()Carrier name getNetworkOperatorName()Phone position getCellLocation()Short Message createFromPdu()

send Number and contents sendTextMessage()intercept Broadcast information abortBroadcast()connect URL-parameter URLConnectionconnect()transmit Parameter execute()encrypt Parameter setEntity()store Parameter writeRec()

Act Obj Act

ActObj

Obj

Data source

Inference engine

SWRL rules

Corollaries7 and 8

Ontology

Inference output(intention description)

Malware

Facts base

1

2Intention

definition

Figure 3 The framework of intention inference system

Figure 4 Inference results of Zitmorsquos behavior relations

Broadcastis_monitored No monitor Broadcast

is_monitored Yes interceptBroadcast

is_ intercepted Yes

Beh_object SmsMessageposition inDevice access SmsMessage

position inMemory encrypt SmsMessageis_plain No

Input or Output URLis_used No connect

Behavior

URLis_used Yes transmit

SmsMessageposition remoteServer

Figure 5 Visualization of Zitmorsquos inference results

Journal of Electrical and Computer Engineering 9

Table 4 Behavior facts extracted from Zitmo

Behavior Code segment or function Behavior objectmonitor onReceive() Broadcastintercept abortBroadcast() Broadcastrsquo

accesscreateFrompdu()getMessageBody()

getOriginatingAddress()

BroadcastrsquoSmsMessage

connect new HttpPost(URL) URLencrypt setEntity() SmsMessage

transmit execute() URLrsquoSmsMessage

Table 5 The inference results of Zitmo

Behavior relationship Final goal representation

hasCompoundwith(monitor intercept) (Broadcast is monitored Yes)(Broadcast is intercepted Yes)

hasCompoundwith(monitor access) (SmsMessage positioninMemory)

hasCompoundwith(access encrypt) (SmsMessage is plain No)hasCombinationwith(connect encrypt) NohasCompoundwith(encrypt transmit) (URL is used Yes)

hasCompoundwith(connect transmit) (SmsMessage positionremoteServer)

facts In this section we evaluate the following aspectsFirst we verify whether the sensitive API library we buildcan cover the behavior facts that exist in malware samplesand give the coverage rate analysis Second to verifythe effectiveness of artificial extraction (semiautomated)behavior facts we compare our manual analysis resultswith the results generated by automated tool CopperDroid[19] Thirdly the effectiveness and reasoning performanceof the ontology inference system are shown Finally thecorrectness readability and effectiveness of the results ofintention reasoning are tackled

We have selected 75 typical malware samples (1 real-world ransomware sample and 9 real-world samples fromGenomersquos 1260 samples and 65 DroidBench samples) forevaluation which are shown in Table 6 Among these 10real-world samples 9 samples are collected from Androidmalware database established by Zhou and Jiang [23] inNorth Carolina State University We use these real-worldsamples because they are typical and representative of theirfamily Other samples in Genome are in same family sothe vulnerabilities and threats in the program are similarFor efficiency consideration we do not add the remainingsamples to experimental sample set DroidBench [4] samplesare designed to assess the correctness of static analyseson Android apps We use these samples as the groundtruths because they have open-sourced programs with clearsemantics [20] Thus we can make an adequate analysis ofthese samples In fact static analysis technique in generaloften lacks capability of extracting runtime behaviors and canbe evaded accordingly but it has incomparable advantagesover dynamic technology Nevertheless this paper focuses

on the research of software intention reduction that is theemphasis is to study the extraction and semantic mappingmethods of malwarersquos behavior the extraction of relationshipbetween behaviors and the representation of malwarersquos finalgoal

51 Effectiveness of Behavior Facts Extraction In generalwe have discovered 102 significant behaviors involving 115sensitive APIs via sensitive APImining inGenomersquos Androidapps and the summary of related literature [15 19ndash21] Wealso refer to Androidrsquos official API document and based onthe information collected a database of sensitiveAPI behavioris established These APIs are controlled by 30 Androidsensitive privileges and have detailed descriptions in officialdocuments

To evaluate the effectiveness and validity of behaviorfacts extracting we compare the description of our behaviorfacts extraction and CopperDroidrsquos [19] behavior descriptionCopperDroid is an automatic VMI-based dynamic analysissystem to reconstruct the behaviors of Android malware thenovelty of CopperDroid lies in its agnostic approach to iden-tify interesting OS- and high-level Android-specific behav-iors We perform a user study on CopperDroid platform andour extraction methods the goal is twofold First we give acomparative analysis on the behavior coverage rate of thesesamples Second we hope to know whether the behaviordescription generated by our methods is readable to averageaudience To this end we compare the behavior descriptionof our methods and CopperDroidrsquos Public Reports [24] Wehave collected 150 copies of CopperDroidrsquos public malwaresample analysis report and make a statistical analysis As

10 Journal of Electrical and Computer Engineering

Table6Malwares

amples

andtheirb

ehaviorinformation

Samplefam

ilyNum

ber

Ours(Cop

per)

Major

behavior

Intentioncla

ssificatio

nZitm

o1

6(2)

mon

itorinterceptaccesscon

nectencrypttransmit

Privacyste

aling

GoldD

ream

19(4)

mon

itoraccessstorecon

necttransmit

Privacyste

aling

DroidDream

15(3)

right

gainaccessconn

ecttransm

itPrivacyste

aling

DroidDelu

xe1

3(2)

mon

itorrig

htgainaccessconn

ecttransm

itPrivacyste

aling

Hippo

SMS

14(2)

send

mon

itoraccessintercept

Tariff

consum

ption

Geinimi

16(3)

remotecontrolsend

con

necttransmit

Tariff

consum

ption

RogueSPP

ush

16(2)

send

mon

itoraccessinterceptdelete

Malicious

chargeback

GGTracker

19(4)

accesscon

necttransmit

storeencryptm

onito

rsend

intercept

Malicious

chargeback

DroidKu

ngFu

-Upd

ate

14(2)

conn

ecttransm

itinsta

llmalpop

upMalwarep

ropagatio

nLo

vebu

ckleword

14(1)

popu

ptamperconn

ecttransm

itrig

htgain

Extortionuser

Aliasin

g1

2(2)

accesssend

Privacyleak

And

roidSpecific

93(2)

accesslogging

send

Privacyleak

ArraysA

ndLists

72(2)

accesssend

Privacyleak

Callb

acks

42(2)

accesssend

Privacyleak

EmulatorDetectio

n3

3(2)

accesslogging

send

Privacyleak

FieldA

ndObjectSensitivity

33(2)

accesslogging

send

Privacyleak

GeneralJava

143(2)

accesslogging

send

Privacyleak

ImplicitF

lows

42(2)

accesslogging

Privacyleak

InterA

ppCom

mun

ication

32(2)

accesssend

Privacyleak

Lifecycle

114(2)

accesscon

nectsendlogging

Privacyleak

Reflection

42(2)

accesssend

Privacyleak

Threading

23(2)

accesslogging

send

Privacyleak

Journal of Electrical and Computer Engineering 11

Table 7 Behavior semantics between ours and CopperDroid

Behavior class Objects Copper Ours Similar

Access personal infoSMS phoneaccountslocation

4 18 90

FS access Open Write 2 2 100

Network access Generic HTTPDNS 3 3 100

Send SMS SMS 1 1 100

Execute external file Generic Priv EscShell Inst APK 6 8 75

Encrypt info File info 0 2 0Intercept SMS SMS 0 2 0Store info Privacy info 0 1 0

Table 8 Intention generation results for samples

Sample set False status Missing desc Correct 119879Genome 1 1 8 10DroidBench 4 2 59 65Total 5 3 67 75

shown in Table 6 the third column (oursCopper) indicatesthe number of behaviorsrsquo extraction using our methodsand the number of behaviors extracted by CopperDroidfor the same sample In terms of the number of extractedbehaviors our extractionmechanism (87) is 776more thanCopperDroidrsquos behavior number (49)

We further study whether the behavior description gen-erated by our methods is readable to average audience Asbehavior facts extraction is the basis of this studywe thus pro-duce 75 behavior descriptions reports for these 75 samplesThe result illustrated in Table 7 depicts that the descriptionsderived from ours are richer than CopperDroid while theratio of behavior similarity ranges from75 to 100Copper-Droid failed to extract some of the behaviors we extract suchas encrypt intercept and store personal info Thus we arguethat the behaviors we extract can reflect the common pro-gram semantics and programming conventions effectively

Notice that though it may take artificial effort to generatebehavior facts description behavior extraction is a technol-ogy that has already been realized We refer to the work ofpredecessors extractingmalware behaviors though automaticextraction technology is not somatureThe focus of this workis to propose a framework for the description and deductionof behavioral intentions We argue that any analysis toolsboth static and dynamic can be utilized in our frameworkto achieve the described malware intention

52 Validity of Ontology Inference System Next we evaluatethe validity of ontology inference system Ontology inferenceexperiment reveals that the workload of manual writing codereduced significantly and Pellet engine reasoning verifiedthat the ontology is complete and consistent We evaluatethe runtime performance and ontology validity for 75 sam-ples preparation for behavior facts (add ontology instance)

dominates the runtime while the intention results generationis usually fairly fast (under 200 milliseconds) The averageinference runtime is 2 seconds while the analysis for amajority (85) of apps can be completed within 5 secondsThe validation of the reasoning results indicates that 893 ofthe samplersquos reasoning results are correct and the intentionresults of graphical display show high readability Table 9shows the readability ratings of 75 apps

53 Correctness Readability and Effectiveness

Correctness To evaluate the correctness we produce inten-tion descriptions for 75 malware sample apps (10 typical real-world samples and 65 DroidBench samples) using our behav-ior facts extraction algorithm and ontology inference engineFirst we use artificial methods to simulate automation toolsto extract the behavior facts in these samples The extractionmechanism is to construct the environment dependencegraph first and then use themethods in Section 43 tomap thesensitive APIs to behavior facts Second the extracted behav-ior facts are instantiated in the tools provided by protege347 and then the instances that are obviously wrong arefiltered Finally we use Jess inference engine for reasoning thebehavior facts and the SWRL rules are imported to inferenceengine at the same timeThe inference results give the outputof behavior sequence and final goal representation (see thedefinition of malware intention) finally the framework isshown in Figure 3

Results Table 8 presents the experimental results which showthat our inference system achieves a true positive rate of893 We must reiterate that the basis of our reasoning isthe sensitive API database we built can fully map sensitivebehaviors in malware Although it is not perfect it can be

12 Journal of Electrical and Computer Engineering

Table 9 Precision results for Appscan and intention reports

Judgment Appscan report Intention descriptiongeno Driod All geno Droid All

(1) Ineffective 5 0 5 1 2 3(2) Uncertain 1 65 66 1 4 5(3) Effective 4 0 4 8 59 67Total 10 65 75 10 65 75

fully covered in the 75 selected samples Inference systemmisses intention descriptions due to two major reasons(1) Artificial behavior facts extraction lacks accuracy Werely on our environment dependence graph and extractionalgorithm to perform simulation analysis However it is notprecise enough to handle the data flow propagation thatcannot be solved by existing static analysis technology (suchas exception handler code reflective calls) (2)The programlogic in real-world malware is more complex and withsemantic ambiguity and there are too many independentsensitive behaviors (in the form of user defined functions)This is not the result of the lack of theory and only technicalproblem

In fact static analysis in general lacks the capability ofextracting runtime behaviors and can be evaded accordinglyNevertheless we argue that our focus is to construct aframeworkwhich produces description and derivationmodelof malware intention

Readability and Effectiveness To evaluate the readability andeffectiveness of inference output (Intention) we perform auser study on our malware intention inference results Thegoal is twofold First we hope to knowwhether the generatedintentions are readable to average audience Second weexpect to see whether our intention descriptions can actuallyhelp users avoid risky apps

Methodology 360 microscopesrsquo Appscan platform [25] is anonline security scanning systemwhich focuses on the securityscan of Android apps The system will provide a scan reportafter the scanning and describe possible malicious intentof the app to users We uploaded all the samples to 360microscopesrsquo Appscan platform and got 75 security reportsof these samples We also collate ontology inference resultsand show them in the form of formalization and graphics75 samplesrsquo intention reports are produced We use both75 Appscan reports and 75 intention reports to measureuser reaction and the number of vulnerabilities providedby 360 microscopesrsquo Appscan platform is compared withthe intention description Furthermore due to the objectiveevaluation consideration we perform the user study basedon the official intents descriptions of 75 apps We use theseofficial descriptions as a reference standard

We have recruited participant directly from our lab-oratory group and we require participant that must besmartphone users We also make sure that participants haveexperiences with Android malware analysis and understandbasic smartphone events such as ldquointercept SMSrdquo or ldquoaccess

GPS locationrdquo We provide a rating for each sample report(Appscan and intention) with respect to its effectiveness Therating ranges from 1 to 3 where 1 means ineffective 2 meansuncertain and 3 means effective as shown in Table 9

Results and Implications Eventually the results are shownin Table 9 360 microscopesrsquo Appscan platform provides acoarser grained description and users can only judge the trueintents of 4 samples according to the Appscan report Whileusers can judge the malice of 67 samples using intentiondescription report This indicates our intention report isreadable even compared to Appscan report created by 360microscopesrsquo Appscan platform The results also reveal thatthe readability and granularity of intention description arerelatively important while Appscan-generated ones some-times confuse users 66 samples cannot be judged correctlythrough Appscan report In a further investigation we noticethat Appscan report provides a coarse intent descriptionof the sample while our security centered intention reportprovides a fairly fine-grained display of intention We believethat this can be further utilized during threat analysis

Threat Analysis Analysts are able to understand the relation-ships between malware behaviors (including trigger processof behaviors data transfer relation and the change in objectcharacteristics) according to the intention description Thisis of great practical significance for users to understand theimplementation mechanism of malware evaluate its possiblelosses and provide preventive solutions

6 Conclusion

In this paper a novel description and derivation model ofAndroid malware intention based on the theory of intentionand malware reverse engineering is proposed to restore theintention of malware and automate the process of intentionreasoning process In order to standardize the new concep-tual system and automate the intent derivation we createontology of malware intention As ontology is computablewe import SWRL rules represented in ontology languageand Fact Base to Jess engine and Jessrsquos outputs are thedescription of malware intention We evaluate our systemusing 75 typical malware samples (5 kinds) Experimentsshow that our inference system is capable of achieving thedesired results of intention description which proved itsfeasibility and effectiveness In addition by using existingreverse technology and data flow analysis tools we can extractbehavioral facts effectively

Journal of Electrical and Computer Engineering 13

Methods of mapping from Data Source to facts aredivided into two forms artificial extraction and automaticextraction We mainly use the methods of artificial factsextraction There are many opportunities for future workssuch as the implementation of automatic facts extractionmethods and the automatic visualization of the reasoningresults

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work was partly supported by Beijing Key Laboratory ofInternet Culture and Digital Dissemination Research Project(no ICDDXN001) National Key Technology Research andDevelopment Program of the Ministry of Science and Tech-nology of China (no 2015BAK12B03-03) Special Projectof Central Government to Guide the Development ofLocal Science and Technology (no Z171100004717002)and National Natural Science Foundation of China (nos61370065 61502040)

References

[1] P Guojun L Jingwen and S Runkang ldquoResearch and Progressof Android malware detectionrdquo Journal of Wuhan University(Science Edition) vol 61 no 1 pp 21ndash33 2015

[2] A Kharraz S Arshad C Mulliner et al ldquoA Large-ScaleAutomated Approach to Detectingrdquo in Proceedings of the 25thUSENIX Security Symposium (USENIX Security 16) pp 757ndash772 USENIX Association 2016

[3] M-Y Su and K-T Fung ldquoDetection of android malwareby static analysis on permissions and sensitive functionsrdquo inProceedings of the 8th International Conference on Ubiquitousand Future Networks ICUFN 2016 pp 873ndash875 Austria July2016

[4] S Arzt S Rasthofer C Fritz et al ldquoFlowDroid precise contextflow field object-sensitive and lifecycle-aware taint analysis forAndroid appsrdquoACM Sigplan Notices vol 49 no 6 pp 259ndash2692014

[5] E Bratman Michael ldquoWhat is intentionrdquo in Intentions incommunication P R Cohen Ed pp 15ndash32 theMITPress 1990

[6] J Shirley and D Evans ldquoThe user is not the enemy Fightingmalware by tracking user intentionsrdquo in Proceedings of the NewSecurity ParadigmsWorkshop 2008 NSPW rsquo08 pp 33ndash45 USASeptember 2008

[7] C-T Jung C-H Sun and M Yuan ldquoAn ontology-enabledframework for a geospatial problem-solving environmentrdquoComputers Environment and Urban Systems vol 38 no 1 pp45ndash57 2013

[8] I Horrocks F P Patel-Schneider H Boley et al SWRL ASemantic Web Rule Language Combining OWL and RuleMLW3C Member Submission 2004

[9] J Bak C Jedrzejek andM Falkowski ldquoUsage of the jess enginerules and ontology to query a relational databaserdquo LectureNotes in Computer Science (including subseries Lecture Notes

in Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 5858 pp 216ndash230 2009

[10] X-Y Du M Li and S Wang ldquoSurvey on ontology learningresearchrdquo Ruan Jian Xue BaoJournal of Software vol 17 no 9pp 1837ndash1847 2006

[11] S Cheng S Luo Z Li et al ldquoStatic Detection of DangerousBehaviors in Android Appsrdquo in Proceedings of the the 5thinternational symposium on cyberspace safety and securitySpringer International Publishing 2013

[12] IDA pro httpwwwhex-rayscomproductsida[13] J Jian and X Qing ldquoOn case auto generation using reverse

analysis for Android malwarerdquo Journal of HeFei University ofTechnology (Natural Science Edition) vol 39 no 4 pp 466ndash4702016

[14] W Yang X Xiao B Andow S Li T Xie and W EnckldquoAppContext differentiating malicious and benign mobile appbehaviors using contextrdquo in Proceedings of the 37th IEEEACMInternational Conference on Software Engineering (ICSE rsquo15) vol1 pp 303ndash313 IEEE May 2015

[15] Y Li T Shen X Sun et al ldquoDetection Classification andCharacterization of Android Malware Using API Data De-pendencyrdquo in Proceedings of the International Conference onSecurity and Privacy in Communication Systems vol 164 pp23ndash40 Springer International Publishing 2015

[16] M Zhang Y Duan H Yin and Z Zhao ldquoSemantics-awareAndroid malware classification using weighted contextual APIdependency graphsrdquo in Proceedings of the 21st ACM Conferenceon Computer and Communications Security (CCS rsquo14) pp 1105ndash1116 ACM Scottsdale Ariz USA November 2014

[17] P Wu H Changzhen and Y Shuping ldquoA dynamic intrusiveintention recognition method based on timed automatardquo Jour-nal of Computer Research and Development vol 48 no 7 pp1288ndash1297 2011

[18] J-H Han S-J Lee and J-H Kim ldquoBehavior Hierarchy-BasedAffordance Map for Recognition of Human Intention and ItsApplication to Human-Robot Interactionrdquo IEEE Transactionson Human-Machine Systems vol 46 no 5 pp 708ndash722 2016

[19] K Tam S J Khan A Fattori et al ldquoCopperDroid automaticreconstruction of android malware behaviorsrdquo in Proceedingsof the Network and Distributed System Security Symposium SanDiego Calif USA 2015

[20] M Zhang and H Yin ldquoAutomatic Generation of Security-Centric Descriptions for Android Appsrdquo inAndroid ApplicationSecurity Springer International Publishing 2016

[21] Z Wang C Li Z Yuan Y Guan and Y Xue ldquoDroidChain Anovel Android malware detection method based on behaviorchainsrdquo Pervasive andMobile Computing vol 32 pp 3ndash14 2016

[22] X Guozhi System science and Engineering Research Shanghaiscience and Technology Education Press 2001

[23] Y Zhou and X Jiang ldquoDissecting android malware char-acterization and evolutionrdquo in Proceedings of the 33rd IEEESymposium on Security and Privacy pp 95ndash109 IEEE 2012

[24] httpcopperdroidisgrhulacukcopperdroidindexphp[25] httpappscan360cn

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

6 Journal of Electrical and Computer Engineering

Malware domain

right_gain

accessstore

encryptmonitor

intercept

connect

transmit

install_malware

popup_ad

tamper

dial

delete

hide_file

is a

is a

send

has

beh_objectbehavior

Android malware

obj_nameattribute

attri_value

hasInput

hasOutput

hasA

ttriva

lueha

sAttr

ibut

e hasNam

e

privacy

right

event

malware

parameter

config_file

decode

hasBehavior

Figure 1 Semantic ontology of behavior intention

Figure 2 SWRL inference rules of the relationship between behaviors and final goal

output object corresponding to B12rsquos object input Rule-1 isthe inference rule corresponding to this condition See Box 1

Premise 2 Any two B1 behaviors are B11 and B12 one B2behavior is B2 B11(in) represents the input to B11 andB11(out) represents its output B2(1in) B2(2in) B2(1out) andB2(2out) represent the input and output of B2 respectively

If the outputrsquos union of any two B1 behaviors is the inputof a certain B2 behavior and the input intersection of thesetwo B1 behaviors is empty then B11 and B12 have a compoundrelation with the B2 behavior respectively There is also acombination relation between B11 and B12The rule is B1-B2-Rule-1 shown in Figure 2 and the rest of the rules are similar

(2) Inference Rules of the Final GoalOn the basis of behaviorrsquosinference rules and Definition 4 the inference rules of finalgoal are summarized as in Figure 2 such as Goal-Rule-1 Theoutputs are the descriptions of objectrsquos final state Due to thelimited space other rules are no longer presented

43 Extraction of Behavior Facts The basic facts include B1and B2 behavior and the behaviorrsquos elements which shouldbe extracted from Data Source are as follows

(1) Behavior (name) extracting the behavior descriptionin a program according to the mapping between the

behaviors defined in our sensitive API database andsensitive APIs (or code segment) in program

(2) Input (behavior object) we determine objects de-scription based on the parameters of sensitive APIsand the official document definition

(3) Output (behavior object) object after a behavioroperates on it its object name and attribute willnot change while its attribute value will be affectedTherefore we can determine the changes in attributevalues based on this behaviorrsquos definition

The mapping from Data Source to behavior facts isdivided into two stages the first stage is behavior recognitionthe second includes object identification and relation analysisbetween objects Detailed process is as follows

First Stage Use reverse tool to decompile these malwaresamples and then generate its call graph (CG) and controlflow graph (CFG) The leaf nodes of the call graph aretraversed to find sensitive API and identify the correspondingbehavior according to themapping relation between behaviorand sensitive API calls Partial mapping relations betweenbehaviors and sensitive API calls and object descriptions areshown in Table 3

Second Stage For each identified behavior the behaviorrsquosobject is identified according to the usage of parameter in

Journal of Electrical and Computer Engineering 7

Table 2 Specification of object attributes and data attributes

Attribute name SpecificationhasCombinationwith Behaviors have combination relationhasCompoundwith Behaviors have compound relationhasInput(x y) The behaviorrsquos single input is ldquoyrdquohasOutput(x z) The behaviorrsquos single output is ldquozrdquohasFirstInput(x y1) The first input of double input is ldquoy1rdquohasSecondInput(x y2) The second input of double input is ldquoy2rdquohasFirstOutput(x z1) The first output of double input is ldquoz1rdquohasSecondOutput(x z2) The second output of double input is ldquoz2rdquohasBehavior(x y) The behavior belonging to malware is ldquoyrdquohasObjectName(x y) The name of the object is ldquoyrdquohasAttribute(x y) The attributes of an object are ldquoyrdquohasAttributeValue(x y) The objectrsquos attributes value is ldquoyrdquo

Rule-1 behavior B1(B11) and behavior B1(B12) and hasBehaviorName(B11 bn1)andhasBehaviorName (B12 bn2)and hasOutput(B11op) and hasInput(B12ip)and hasObjectname(op on1) and hasObjectname(ip on2)andswrlbequal(on1on2) and hasAttribute(op att1) and has Attribute(ip att2) and hasValue(att1 attri1) and hasValue(att2 attri2) and swrlbequal (attri1 attri2) and hasAttributeValue(opattv1) and hasAttributeValue (ip attv2) and hasValue(attv1 attriv1) and hasValue(attv2attriv2) and swrlbequal(attriv1attriv2)rarr hasCompoundwith(bn1 bn2)

Box 1

sensitive API (two categories the first obtains behavior objectbased on the class object and the definition of API such asgetDeviceId() the second obtains behavior object based onthe parameter usage of API and the definition of API such asRuntimeexec (RootExploitfile)) Data dependence betweenbehaviors is analyzed using FlowDroid [4]

44 Framework of Inference System The framework of inten-tion inference system is shown in Figure 3

We use the knowledge of intention definition and Corol-laries 7 and 8 to construct SWRL rules (the rules arerepresented in ontology language) Extracting Data Sourcefrom Android applications uses reverse engineering technol-ogy [15] The extraction methods of facts are illustrated inSection 43 Jess inference engine completes the reasoningprocess using the Facts Base and SWRL rules and givesthe inference output (the description of malware intention)finally

45 Motivation Example Take Zitmo [16] as an example toshow the facts extracted from these samples as shown inTable 4 Among these extracted facts access and transmit areB2 behavior B2rsquos double input indicates that the executionof the behavior needs to meet the condition of the two keyinputs

After the rules and facts are imported to Jess enginethe inference results are exported they are the formalized

descriptions of intentions of each malicious sample Theresults of Zitmorsquos behavior relations are shown in Figure 4Other results of intention reasoning are analyzed separatelyand show reasonable efficiency

We extract behavior relations and goal representationfrom Figure 4 which are shown in Table 5

Zitmorsquos intention includes behavior sequence (consistingof behavior set and behavior relation) and the representationof goal (output object description) To visually demonstratethe reasoning results we display them in Figure 5 graphi-cally In Figure 5 rectangle represents the behavior ellipserepresents the input or output object of the behavior andarrow indicates the relationship between object and behaviorThe ellipse with a black font represents the raw input ofobjects involved in the intention respectively as broadcastinformation text messages and URL addresses These finalstates are as follows broadcast message is monitored andintercepted the contents of the messages are transmitted toremote server and the URL is used as the destination addressof the transmission They represent the goal of the intentiontogether

5 Evaluation

We have implemented the ontology inference systemdescribed in Section 44 in a prototype inference system foridentifying and describing malware intention in behavior

8 Journal of Electrical and Computer Engineering

Table 3 Partial mappings between behavior and sensitive APIs

Behavior Behavior object sensitiveAPIright gain Root permission Runtimeexec()

access

Device ID getDeviceId()Carrier name getNetworkOperatorName()Phone position getCellLocation()Short Message createFromPdu()

send Number and contents sendTextMessage()intercept Broadcast information abortBroadcast()connect URL-parameter URLConnectionconnect()transmit Parameter execute()encrypt Parameter setEntity()store Parameter writeRec()

Act Obj Act

ActObj

Obj

Data source

Inference engine

SWRL rules

Corollaries7 and 8

Ontology

Inference output(intention description)

Malware

Facts base

1

2Intention

definition

Figure 3 The framework of intention inference system

Figure 4 Inference results of Zitmorsquos behavior relations

Broadcastis_monitored No monitor Broadcast

is_monitored Yes interceptBroadcast

is_ intercepted Yes

Beh_object SmsMessageposition inDevice access SmsMessage

position inMemory encrypt SmsMessageis_plain No

Input or Output URLis_used No connect

Behavior

URLis_used Yes transmit

SmsMessageposition remoteServer

Figure 5 Visualization of Zitmorsquos inference results

Journal of Electrical and Computer Engineering 9

Table 4 Behavior facts extracted from Zitmo

Behavior Code segment or function Behavior objectmonitor onReceive() Broadcastintercept abortBroadcast() Broadcastrsquo

accesscreateFrompdu()getMessageBody()

getOriginatingAddress()

BroadcastrsquoSmsMessage

connect new HttpPost(URL) URLencrypt setEntity() SmsMessage

transmit execute() URLrsquoSmsMessage

Table 5 The inference results of Zitmo

Behavior relationship Final goal representation

hasCompoundwith(monitor intercept) (Broadcast is monitored Yes)(Broadcast is intercepted Yes)

hasCompoundwith(monitor access) (SmsMessage positioninMemory)

hasCompoundwith(access encrypt) (SmsMessage is plain No)hasCombinationwith(connect encrypt) NohasCompoundwith(encrypt transmit) (URL is used Yes)

hasCompoundwith(connect transmit) (SmsMessage positionremoteServer)

facts In this section we evaluate the following aspectsFirst we verify whether the sensitive API library we buildcan cover the behavior facts that exist in malware samplesand give the coverage rate analysis Second to verifythe effectiveness of artificial extraction (semiautomated)behavior facts we compare our manual analysis resultswith the results generated by automated tool CopperDroid[19] Thirdly the effectiveness and reasoning performanceof the ontology inference system are shown Finally thecorrectness readability and effectiveness of the results ofintention reasoning are tackled

We have selected 75 typical malware samples (1 real-world ransomware sample and 9 real-world samples fromGenomersquos 1260 samples and 65 DroidBench samples) forevaluation which are shown in Table 6 Among these 10real-world samples 9 samples are collected from Androidmalware database established by Zhou and Jiang [23] inNorth Carolina State University We use these real-worldsamples because they are typical and representative of theirfamily Other samples in Genome are in same family sothe vulnerabilities and threats in the program are similarFor efficiency consideration we do not add the remainingsamples to experimental sample set DroidBench [4] samplesare designed to assess the correctness of static analyseson Android apps We use these samples as the groundtruths because they have open-sourced programs with clearsemantics [20] Thus we can make an adequate analysis ofthese samples In fact static analysis technique in generaloften lacks capability of extracting runtime behaviors and canbe evaded accordingly but it has incomparable advantagesover dynamic technology Nevertheless this paper focuses

on the research of software intention reduction that is theemphasis is to study the extraction and semantic mappingmethods of malwarersquos behavior the extraction of relationshipbetween behaviors and the representation of malwarersquos finalgoal

51 Effectiveness of Behavior Facts Extraction In generalwe have discovered 102 significant behaviors involving 115sensitive APIs via sensitive APImining inGenomersquos Androidapps and the summary of related literature [15 19ndash21] Wealso refer to Androidrsquos official API document and based onthe information collected a database of sensitiveAPI behavioris established These APIs are controlled by 30 Androidsensitive privileges and have detailed descriptions in officialdocuments

To evaluate the effectiveness and validity of behaviorfacts extracting we compare the description of our behaviorfacts extraction and CopperDroidrsquos [19] behavior descriptionCopperDroid is an automatic VMI-based dynamic analysissystem to reconstruct the behaviors of Android malware thenovelty of CopperDroid lies in its agnostic approach to iden-tify interesting OS- and high-level Android-specific behav-iors We perform a user study on CopperDroid platform andour extraction methods the goal is twofold First we give acomparative analysis on the behavior coverage rate of thesesamples Second we hope to know whether the behaviordescription generated by our methods is readable to averageaudience To this end we compare the behavior descriptionof our methods and CopperDroidrsquos Public Reports [24] Wehave collected 150 copies of CopperDroidrsquos public malwaresample analysis report and make a statistical analysis As

10 Journal of Electrical and Computer Engineering

Table6Malwares

amples

andtheirb

ehaviorinformation

Samplefam

ilyNum

ber

Ours(Cop

per)

Major

behavior

Intentioncla

ssificatio

nZitm

o1

6(2)

mon

itorinterceptaccesscon

nectencrypttransmit

Privacyste

aling

GoldD

ream

19(4)

mon

itoraccessstorecon

necttransmit

Privacyste

aling

DroidDream

15(3)

right

gainaccessconn

ecttransm

itPrivacyste

aling

DroidDelu

xe1

3(2)

mon

itorrig

htgainaccessconn

ecttransm

itPrivacyste

aling

Hippo

SMS

14(2)

send

mon

itoraccessintercept

Tariff

consum

ption

Geinimi

16(3)

remotecontrolsend

con

necttransmit

Tariff

consum

ption

RogueSPP

ush

16(2)

send

mon

itoraccessinterceptdelete

Malicious

chargeback

GGTracker

19(4)

accesscon

necttransmit

storeencryptm

onito

rsend

intercept

Malicious

chargeback

DroidKu

ngFu

-Upd

ate

14(2)

conn

ecttransm

itinsta

llmalpop

upMalwarep

ropagatio

nLo

vebu

ckleword

14(1)

popu

ptamperconn

ecttransm

itrig

htgain

Extortionuser

Aliasin

g1

2(2)

accesssend

Privacyleak

And

roidSpecific

93(2)

accesslogging

send

Privacyleak

ArraysA

ndLists

72(2)

accesssend

Privacyleak

Callb

acks

42(2)

accesssend

Privacyleak

EmulatorDetectio

n3

3(2)

accesslogging

send

Privacyleak

FieldA

ndObjectSensitivity

33(2)

accesslogging

send

Privacyleak

GeneralJava

143(2)

accesslogging

send

Privacyleak

ImplicitF

lows

42(2)

accesslogging

Privacyleak

InterA

ppCom

mun

ication

32(2)

accesssend

Privacyleak

Lifecycle

114(2)

accesscon

nectsendlogging

Privacyleak

Reflection

42(2)

accesssend

Privacyleak

Threading

23(2)

accesslogging

send

Privacyleak

Journal of Electrical and Computer Engineering 11

Table 7 Behavior semantics between ours and CopperDroid

Behavior class Objects Copper Ours Similar

Access personal infoSMS phoneaccountslocation

4 18 90

FS access Open Write 2 2 100

Network access Generic HTTPDNS 3 3 100

Send SMS SMS 1 1 100

Execute external file Generic Priv EscShell Inst APK 6 8 75

Encrypt info File info 0 2 0Intercept SMS SMS 0 2 0Store info Privacy info 0 1 0

Table 8 Intention generation results for samples

Sample set False status Missing desc Correct 119879Genome 1 1 8 10DroidBench 4 2 59 65Total 5 3 67 75

shown in Table 6 the third column (oursCopper) indicatesthe number of behaviorsrsquo extraction using our methodsand the number of behaviors extracted by CopperDroidfor the same sample In terms of the number of extractedbehaviors our extractionmechanism (87) is 776more thanCopperDroidrsquos behavior number (49)

We further study whether the behavior description gen-erated by our methods is readable to average audience Asbehavior facts extraction is the basis of this studywe thus pro-duce 75 behavior descriptions reports for these 75 samplesThe result illustrated in Table 7 depicts that the descriptionsderived from ours are richer than CopperDroid while theratio of behavior similarity ranges from75 to 100Copper-Droid failed to extract some of the behaviors we extract suchas encrypt intercept and store personal info Thus we arguethat the behaviors we extract can reflect the common pro-gram semantics and programming conventions effectively

Notice that though it may take artificial effort to generatebehavior facts description behavior extraction is a technol-ogy that has already been realized We refer to the work ofpredecessors extractingmalware behaviors though automaticextraction technology is not somatureThe focus of this workis to propose a framework for the description and deductionof behavioral intentions We argue that any analysis toolsboth static and dynamic can be utilized in our frameworkto achieve the described malware intention

52 Validity of Ontology Inference System Next we evaluatethe validity of ontology inference system Ontology inferenceexperiment reveals that the workload of manual writing codereduced significantly and Pellet engine reasoning verifiedthat the ontology is complete and consistent We evaluatethe runtime performance and ontology validity for 75 sam-ples preparation for behavior facts (add ontology instance)

dominates the runtime while the intention results generationis usually fairly fast (under 200 milliseconds) The averageinference runtime is 2 seconds while the analysis for amajority (85) of apps can be completed within 5 secondsThe validation of the reasoning results indicates that 893 ofthe samplersquos reasoning results are correct and the intentionresults of graphical display show high readability Table 9shows the readability ratings of 75 apps

53 Correctness Readability and Effectiveness

Correctness To evaluate the correctness we produce inten-tion descriptions for 75 malware sample apps (10 typical real-world samples and 65 DroidBench samples) using our behav-ior facts extraction algorithm and ontology inference engineFirst we use artificial methods to simulate automation toolsto extract the behavior facts in these samples The extractionmechanism is to construct the environment dependencegraph first and then use themethods in Section 43 tomap thesensitive APIs to behavior facts Second the extracted behav-ior facts are instantiated in the tools provided by protege347 and then the instances that are obviously wrong arefiltered Finally we use Jess inference engine for reasoning thebehavior facts and the SWRL rules are imported to inferenceengine at the same timeThe inference results give the outputof behavior sequence and final goal representation (see thedefinition of malware intention) finally the framework isshown in Figure 3

Results Table 8 presents the experimental results which showthat our inference system achieves a true positive rate of893 We must reiterate that the basis of our reasoning isthe sensitive API database we built can fully map sensitivebehaviors in malware Although it is not perfect it can be

12 Journal of Electrical and Computer Engineering

Table 9 Precision results for Appscan and intention reports

Judgment Appscan report Intention descriptiongeno Driod All geno Droid All

(1) Ineffective 5 0 5 1 2 3(2) Uncertain 1 65 66 1 4 5(3) Effective 4 0 4 8 59 67Total 10 65 75 10 65 75

fully covered in the 75 selected samples Inference systemmisses intention descriptions due to two major reasons(1) Artificial behavior facts extraction lacks accuracy Werely on our environment dependence graph and extractionalgorithm to perform simulation analysis However it is notprecise enough to handle the data flow propagation thatcannot be solved by existing static analysis technology (suchas exception handler code reflective calls) (2)The programlogic in real-world malware is more complex and withsemantic ambiguity and there are too many independentsensitive behaviors (in the form of user defined functions)This is not the result of the lack of theory and only technicalproblem

In fact static analysis in general lacks the capability ofextracting runtime behaviors and can be evaded accordinglyNevertheless we argue that our focus is to construct aframeworkwhich produces description and derivationmodelof malware intention

Readability and Effectiveness To evaluate the readability andeffectiveness of inference output (Intention) we perform auser study on our malware intention inference results Thegoal is twofold First we hope to knowwhether the generatedintentions are readable to average audience Second weexpect to see whether our intention descriptions can actuallyhelp users avoid risky apps

Methodology 360 microscopesrsquo Appscan platform [25] is anonline security scanning systemwhich focuses on the securityscan of Android apps The system will provide a scan reportafter the scanning and describe possible malicious intentof the app to users We uploaded all the samples to 360microscopesrsquo Appscan platform and got 75 security reportsof these samples We also collate ontology inference resultsand show them in the form of formalization and graphics75 samplesrsquo intention reports are produced We use both75 Appscan reports and 75 intention reports to measureuser reaction and the number of vulnerabilities providedby 360 microscopesrsquo Appscan platform is compared withthe intention description Furthermore due to the objectiveevaluation consideration we perform the user study basedon the official intents descriptions of 75 apps We use theseofficial descriptions as a reference standard

We have recruited participant directly from our lab-oratory group and we require participant that must besmartphone users We also make sure that participants haveexperiences with Android malware analysis and understandbasic smartphone events such as ldquointercept SMSrdquo or ldquoaccess

GPS locationrdquo We provide a rating for each sample report(Appscan and intention) with respect to its effectiveness Therating ranges from 1 to 3 where 1 means ineffective 2 meansuncertain and 3 means effective as shown in Table 9

Results and Implications Eventually the results are shownin Table 9 360 microscopesrsquo Appscan platform provides acoarser grained description and users can only judge the trueintents of 4 samples according to the Appscan report Whileusers can judge the malice of 67 samples using intentiondescription report This indicates our intention report isreadable even compared to Appscan report created by 360microscopesrsquo Appscan platform The results also reveal thatthe readability and granularity of intention description arerelatively important while Appscan-generated ones some-times confuse users 66 samples cannot be judged correctlythrough Appscan report In a further investigation we noticethat Appscan report provides a coarse intent descriptionof the sample while our security centered intention reportprovides a fairly fine-grained display of intention We believethat this can be further utilized during threat analysis

Threat Analysis Analysts are able to understand the relation-ships between malware behaviors (including trigger processof behaviors data transfer relation and the change in objectcharacteristics) according to the intention description Thisis of great practical significance for users to understand theimplementation mechanism of malware evaluate its possiblelosses and provide preventive solutions

6 Conclusion

In this paper a novel description and derivation model ofAndroid malware intention based on the theory of intentionand malware reverse engineering is proposed to restore theintention of malware and automate the process of intentionreasoning process In order to standardize the new concep-tual system and automate the intent derivation we createontology of malware intention As ontology is computablewe import SWRL rules represented in ontology languageand Fact Base to Jess engine and Jessrsquos outputs are thedescription of malware intention We evaluate our systemusing 75 typical malware samples (5 kinds) Experimentsshow that our inference system is capable of achieving thedesired results of intention description which proved itsfeasibility and effectiveness In addition by using existingreverse technology and data flow analysis tools we can extractbehavioral facts effectively

Journal of Electrical and Computer Engineering 13

Methods of mapping from Data Source to facts aredivided into two forms artificial extraction and automaticextraction We mainly use the methods of artificial factsextraction There are many opportunities for future workssuch as the implementation of automatic facts extractionmethods and the automatic visualization of the reasoningresults

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work was partly supported by Beijing Key Laboratory ofInternet Culture and Digital Dissemination Research Project(no ICDDXN001) National Key Technology Research andDevelopment Program of the Ministry of Science and Tech-nology of China (no 2015BAK12B03-03) Special Projectof Central Government to Guide the Development ofLocal Science and Technology (no Z171100004717002)and National Natural Science Foundation of China (nos61370065 61502040)

References

[1] P Guojun L Jingwen and S Runkang ldquoResearch and Progressof Android malware detectionrdquo Journal of Wuhan University(Science Edition) vol 61 no 1 pp 21ndash33 2015

[2] A Kharraz S Arshad C Mulliner et al ldquoA Large-ScaleAutomated Approach to Detectingrdquo in Proceedings of the 25thUSENIX Security Symposium (USENIX Security 16) pp 757ndash772 USENIX Association 2016

[3] M-Y Su and K-T Fung ldquoDetection of android malwareby static analysis on permissions and sensitive functionsrdquo inProceedings of the 8th International Conference on Ubiquitousand Future Networks ICUFN 2016 pp 873ndash875 Austria July2016

[4] S Arzt S Rasthofer C Fritz et al ldquoFlowDroid precise contextflow field object-sensitive and lifecycle-aware taint analysis forAndroid appsrdquoACM Sigplan Notices vol 49 no 6 pp 259ndash2692014

[5] E Bratman Michael ldquoWhat is intentionrdquo in Intentions incommunication P R Cohen Ed pp 15ndash32 theMITPress 1990

[6] J Shirley and D Evans ldquoThe user is not the enemy Fightingmalware by tracking user intentionsrdquo in Proceedings of the NewSecurity ParadigmsWorkshop 2008 NSPW rsquo08 pp 33ndash45 USASeptember 2008

[7] C-T Jung C-H Sun and M Yuan ldquoAn ontology-enabledframework for a geospatial problem-solving environmentrdquoComputers Environment and Urban Systems vol 38 no 1 pp45ndash57 2013

[8] I Horrocks F P Patel-Schneider H Boley et al SWRL ASemantic Web Rule Language Combining OWL and RuleMLW3C Member Submission 2004

[9] J Bak C Jedrzejek andM Falkowski ldquoUsage of the jess enginerules and ontology to query a relational databaserdquo LectureNotes in Computer Science (including subseries Lecture Notes

in Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 5858 pp 216ndash230 2009

[10] X-Y Du M Li and S Wang ldquoSurvey on ontology learningresearchrdquo Ruan Jian Xue BaoJournal of Software vol 17 no 9pp 1837ndash1847 2006

[11] S Cheng S Luo Z Li et al ldquoStatic Detection of DangerousBehaviors in Android Appsrdquo in Proceedings of the the 5thinternational symposium on cyberspace safety and securitySpringer International Publishing 2013

[12] IDA pro httpwwwhex-rayscomproductsida[13] J Jian and X Qing ldquoOn case auto generation using reverse

analysis for Android malwarerdquo Journal of HeFei University ofTechnology (Natural Science Edition) vol 39 no 4 pp 466ndash4702016

[14] W Yang X Xiao B Andow S Li T Xie and W EnckldquoAppContext differentiating malicious and benign mobile appbehaviors using contextrdquo in Proceedings of the 37th IEEEACMInternational Conference on Software Engineering (ICSE rsquo15) vol1 pp 303ndash313 IEEE May 2015

[15] Y Li T Shen X Sun et al ldquoDetection Classification andCharacterization of Android Malware Using API Data De-pendencyrdquo in Proceedings of the International Conference onSecurity and Privacy in Communication Systems vol 164 pp23ndash40 Springer International Publishing 2015

[16] M Zhang Y Duan H Yin and Z Zhao ldquoSemantics-awareAndroid malware classification using weighted contextual APIdependency graphsrdquo in Proceedings of the 21st ACM Conferenceon Computer and Communications Security (CCS rsquo14) pp 1105ndash1116 ACM Scottsdale Ariz USA November 2014

[17] P Wu H Changzhen and Y Shuping ldquoA dynamic intrusiveintention recognition method based on timed automatardquo Jour-nal of Computer Research and Development vol 48 no 7 pp1288ndash1297 2011

[18] J-H Han S-J Lee and J-H Kim ldquoBehavior Hierarchy-BasedAffordance Map for Recognition of Human Intention and ItsApplication to Human-Robot Interactionrdquo IEEE Transactionson Human-Machine Systems vol 46 no 5 pp 708ndash722 2016

[19] K Tam S J Khan A Fattori et al ldquoCopperDroid automaticreconstruction of android malware behaviorsrdquo in Proceedingsof the Network and Distributed System Security Symposium SanDiego Calif USA 2015

[20] M Zhang and H Yin ldquoAutomatic Generation of Security-Centric Descriptions for Android Appsrdquo inAndroid ApplicationSecurity Springer International Publishing 2016

[21] Z Wang C Li Z Yuan Y Guan and Y Xue ldquoDroidChain Anovel Android malware detection method based on behaviorchainsrdquo Pervasive andMobile Computing vol 32 pp 3ndash14 2016

[22] X Guozhi System science and Engineering Research Shanghaiscience and Technology Education Press 2001

[23] Y Zhou and X Jiang ldquoDissecting android malware char-acterization and evolutionrdquo in Proceedings of the 33rd IEEESymposium on Security and Privacy pp 95ndash109 IEEE 2012

[24] httpcopperdroidisgrhulacukcopperdroidindexphp[25] httpappscan360cn

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

Journal of Electrical and Computer Engineering 7

Table 2 Specification of object attributes and data attributes

Attribute name SpecificationhasCombinationwith Behaviors have combination relationhasCompoundwith Behaviors have compound relationhasInput(x y) The behaviorrsquos single input is ldquoyrdquohasOutput(x z) The behaviorrsquos single output is ldquozrdquohasFirstInput(x y1) The first input of double input is ldquoy1rdquohasSecondInput(x y2) The second input of double input is ldquoy2rdquohasFirstOutput(x z1) The first output of double input is ldquoz1rdquohasSecondOutput(x z2) The second output of double input is ldquoz2rdquohasBehavior(x y) The behavior belonging to malware is ldquoyrdquohasObjectName(x y) The name of the object is ldquoyrdquohasAttribute(x y) The attributes of an object are ldquoyrdquohasAttributeValue(x y) The objectrsquos attributes value is ldquoyrdquo

Rule-1 behavior B1(B11) and behavior B1(B12) and hasBehaviorName(B11 bn1)andhasBehaviorName (B12 bn2)and hasOutput(B11op) and hasInput(B12ip)and hasObjectname(op on1) and hasObjectname(ip on2)andswrlbequal(on1on2) and hasAttribute(op att1) and has Attribute(ip att2) and hasValue(att1 attri1) and hasValue(att2 attri2) and swrlbequal (attri1 attri2) and hasAttributeValue(opattv1) and hasAttributeValue (ip attv2) and hasValue(attv1 attriv1) and hasValue(attv2attriv2) and swrlbequal(attriv1attriv2)rarr hasCompoundwith(bn1 bn2)

Box 1

sensitive API (two categories the first obtains behavior objectbased on the class object and the definition of API such asgetDeviceId() the second obtains behavior object based onthe parameter usage of API and the definition of API such asRuntimeexec (RootExploitfile)) Data dependence betweenbehaviors is analyzed using FlowDroid [4]

44 Framework of Inference System The framework of inten-tion inference system is shown in Figure 3

We use the knowledge of intention definition and Corol-laries 7 and 8 to construct SWRL rules (the rules arerepresented in ontology language) Extracting Data Sourcefrom Android applications uses reverse engineering technol-ogy [15] The extraction methods of facts are illustrated inSection 43 Jess inference engine completes the reasoningprocess using the Facts Base and SWRL rules and givesthe inference output (the description of malware intention)finally

45 Motivation Example Take Zitmo [16] as an example toshow the facts extracted from these samples as shown inTable 4 Among these extracted facts access and transmit areB2 behavior B2rsquos double input indicates that the executionof the behavior needs to meet the condition of the two keyinputs

After the rules and facts are imported to Jess enginethe inference results are exported they are the formalized

descriptions of intentions of each malicious sample Theresults of Zitmorsquos behavior relations are shown in Figure 4Other results of intention reasoning are analyzed separatelyand show reasonable efficiency

We extract behavior relations and goal representationfrom Figure 4 which are shown in Table 5

Zitmorsquos intention includes behavior sequence (consistingof behavior set and behavior relation) and the representationof goal (output object description) To visually demonstratethe reasoning results we display them in Figure 5 graphi-cally In Figure 5 rectangle represents the behavior ellipserepresents the input or output object of the behavior andarrow indicates the relationship between object and behaviorThe ellipse with a black font represents the raw input ofobjects involved in the intention respectively as broadcastinformation text messages and URL addresses These finalstates are as follows broadcast message is monitored andintercepted the contents of the messages are transmitted toremote server and the URL is used as the destination addressof the transmission They represent the goal of the intentiontogether

5 Evaluation

We have implemented the ontology inference systemdescribed in Section 44 in a prototype inference system foridentifying and describing malware intention in behavior

8 Journal of Electrical and Computer Engineering

Table 3 Partial mappings between behavior and sensitive APIs

Behavior Behavior object sensitiveAPIright gain Root permission Runtimeexec()

access

Device ID getDeviceId()Carrier name getNetworkOperatorName()Phone position getCellLocation()Short Message createFromPdu()

send Number and contents sendTextMessage()intercept Broadcast information abortBroadcast()connect URL-parameter URLConnectionconnect()transmit Parameter execute()encrypt Parameter setEntity()store Parameter writeRec()

Act Obj Act

ActObj

Obj

Data source

Inference engine

SWRL rules

Corollaries7 and 8

Ontology

Inference output(intention description)

Malware

Facts base

1

2Intention

definition

Figure 3 The framework of intention inference system

Figure 4 Inference results of Zitmorsquos behavior relations

Broadcastis_monitored No monitor Broadcast

is_monitored Yes interceptBroadcast

is_ intercepted Yes

Beh_object SmsMessageposition inDevice access SmsMessage

position inMemory encrypt SmsMessageis_plain No

Input or Output URLis_used No connect

Behavior

URLis_used Yes transmit

SmsMessageposition remoteServer

Figure 5 Visualization of Zitmorsquos inference results

Journal of Electrical and Computer Engineering 9

Table 4 Behavior facts extracted from Zitmo

Behavior Code segment or function Behavior objectmonitor onReceive() Broadcastintercept abortBroadcast() Broadcastrsquo

accesscreateFrompdu()getMessageBody()

getOriginatingAddress()

BroadcastrsquoSmsMessage

connect new HttpPost(URL) URLencrypt setEntity() SmsMessage

transmit execute() URLrsquoSmsMessage

Table 5 The inference results of Zitmo

Behavior relationship Final goal representation

hasCompoundwith(monitor intercept) (Broadcast is monitored Yes)(Broadcast is intercepted Yes)

hasCompoundwith(monitor access) (SmsMessage positioninMemory)

hasCompoundwith(access encrypt) (SmsMessage is plain No)hasCombinationwith(connect encrypt) NohasCompoundwith(encrypt transmit) (URL is used Yes)

hasCompoundwith(connect transmit) (SmsMessage positionremoteServer)

facts In this section we evaluate the following aspectsFirst we verify whether the sensitive API library we buildcan cover the behavior facts that exist in malware samplesand give the coverage rate analysis Second to verifythe effectiveness of artificial extraction (semiautomated)behavior facts we compare our manual analysis resultswith the results generated by automated tool CopperDroid[19] Thirdly the effectiveness and reasoning performanceof the ontology inference system are shown Finally thecorrectness readability and effectiveness of the results ofintention reasoning are tackled

We have selected 75 typical malware samples (1 real-world ransomware sample and 9 real-world samples fromGenomersquos 1260 samples and 65 DroidBench samples) forevaluation which are shown in Table 6 Among these 10real-world samples 9 samples are collected from Androidmalware database established by Zhou and Jiang [23] inNorth Carolina State University We use these real-worldsamples because they are typical and representative of theirfamily Other samples in Genome are in same family sothe vulnerabilities and threats in the program are similarFor efficiency consideration we do not add the remainingsamples to experimental sample set DroidBench [4] samplesare designed to assess the correctness of static analyseson Android apps We use these samples as the groundtruths because they have open-sourced programs with clearsemantics [20] Thus we can make an adequate analysis ofthese samples In fact static analysis technique in generaloften lacks capability of extracting runtime behaviors and canbe evaded accordingly but it has incomparable advantagesover dynamic technology Nevertheless this paper focuses

on the research of software intention reduction that is theemphasis is to study the extraction and semantic mappingmethods of malwarersquos behavior the extraction of relationshipbetween behaviors and the representation of malwarersquos finalgoal

51 Effectiveness of Behavior Facts Extraction In generalwe have discovered 102 significant behaviors involving 115sensitive APIs via sensitive APImining inGenomersquos Androidapps and the summary of related literature [15 19ndash21] Wealso refer to Androidrsquos official API document and based onthe information collected a database of sensitiveAPI behavioris established These APIs are controlled by 30 Androidsensitive privileges and have detailed descriptions in officialdocuments

To evaluate the effectiveness and validity of behaviorfacts extracting we compare the description of our behaviorfacts extraction and CopperDroidrsquos [19] behavior descriptionCopperDroid is an automatic VMI-based dynamic analysissystem to reconstruct the behaviors of Android malware thenovelty of CopperDroid lies in its agnostic approach to iden-tify interesting OS- and high-level Android-specific behav-iors We perform a user study on CopperDroid platform andour extraction methods the goal is twofold First we give acomparative analysis on the behavior coverage rate of thesesamples Second we hope to know whether the behaviordescription generated by our methods is readable to averageaudience To this end we compare the behavior descriptionof our methods and CopperDroidrsquos Public Reports [24] Wehave collected 150 copies of CopperDroidrsquos public malwaresample analysis report and make a statistical analysis As

10 Journal of Electrical and Computer Engineering

Table6Malwares

amples

andtheirb

ehaviorinformation

Samplefam

ilyNum

ber

Ours(Cop

per)

Major

behavior

Intentioncla

ssificatio

nZitm

o1

6(2)

mon

itorinterceptaccesscon

nectencrypttransmit

Privacyste

aling

GoldD

ream

19(4)

mon

itoraccessstorecon

necttransmit

Privacyste

aling

DroidDream

15(3)

right

gainaccessconn

ecttransm

itPrivacyste

aling

DroidDelu

xe1

3(2)

mon

itorrig

htgainaccessconn

ecttransm

itPrivacyste

aling

Hippo

SMS

14(2)

send

mon

itoraccessintercept

Tariff

consum

ption

Geinimi

16(3)

remotecontrolsend

con

necttransmit

Tariff

consum

ption

RogueSPP

ush

16(2)

send

mon

itoraccessinterceptdelete

Malicious

chargeback

GGTracker

19(4)

accesscon

necttransmit

storeencryptm

onito

rsend

intercept

Malicious

chargeback

DroidKu

ngFu

-Upd

ate

14(2)

conn

ecttransm

itinsta

llmalpop

upMalwarep

ropagatio

nLo

vebu

ckleword

14(1)

popu

ptamperconn

ecttransm

itrig

htgain

Extortionuser

Aliasin

g1

2(2)

accesssend

Privacyleak

And

roidSpecific

93(2)

accesslogging

send

Privacyleak

ArraysA

ndLists

72(2)

accesssend

Privacyleak

Callb

acks

42(2)

accesssend

Privacyleak

EmulatorDetectio

n3

3(2)

accesslogging

send

Privacyleak

FieldA

ndObjectSensitivity

33(2)

accesslogging

send

Privacyleak

GeneralJava

143(2)

accesslogging

send

Privacyleak

ImplicitF

lows

42(2)

accesslogging

Privacyleak

InterA

ppCom

mun

ication

32(2)

accesssend

Privacyleak

Lifecycle

114(2)

accesscon

nectsendlogging

Privacyleak

Reflection

42(2)

accesssend

Privacyleak

Threading

23(2)

accesslogging

send

Privacyleak

Journal of Electrical and Computer Engineering 11

Table 7 Behavior semantics between ours and CopperDroid

Behavior class Objects Copper Ours Similar

Access personal infoSMS phoneaccountslocation

4 18 90

FS access Open Write 2 2 100

Network access Generic HTTPDNS 3 3 100

Send SMS SMS 1 1 100

Execute external file Generic Priv EscShell Inst APK 6 8 75

Encrypt info File info 0 2 0Intercept SMS SMS 0 2 0Store info Privacy info 0 1 0

Table 8 Intention generation results for samples

Sample set False status Missing desc Correct 119879Genome 1 1 8 10DroidBench 4 2 59 65Total 5 3 67 75

shown in Table 6 the third column (oursCopper) indicatesthe number of behaviorsrsquo extraction using our methodsand the number of behaviors extracted by CopperDroidfor the same sample In terms of the number of extractedbehaviors our extractionmechanism (87) is 776more thanCopperDroidrsquos behavior number (49)

We further study whether the behavior description gen-erated by our methods is readable to average audience Asbehavior facts extraction is the basis of this studywe thus pro-duce 75 behavior descriptions reports for these 75 samplesThe result illustrated in Table 7 depicts that the descriptionsderived from ours are richer than CopperDroid while theratio of behavior similarity ranges from75 to 100Copper-Droid failed to extract some of the behaviors we extract suchas encrypt intercept and store personal info Thus we arguethat the behaviors we extract can reflect the common pro-gram semantics and programming conventions effectively

Notice that though it may take artificial effort to generatebehavior facts description behavior extraction is a technol-ogy that has already been realized We refer to the work ofpredecessors extractingmalware behaviors though automaticextraction technology is not somatureThe focus of this workis to propose a framework for the description and deductionof behavioral intentions We argue that any analysis toolsboth static and dynamic can be utilized in our frameworkto achieve the described malware intention

52 Validity of Ontology Inference System Next we evaluatethe validity of ontology inference system Ontology inferenceexperiment reveals that the workload of manual writing codereduced significantly and Pellet engine reasoning verifiedthat the ontology is complete and consistent We evaluatethe runtime performance and ontology validity for 75 sam-ples preparation for behavior facts (add ontology instance)

dominates the runtime while the intention results generationis usually fairly fast (under 200 milliseconds) The averageinference runtime is 2 seconds while the analysis for amajority (85) of apps can be completed within 5 secondsThe validation of the reasoning results indicates that 893 ofthe samplersquos reasoning results are correct and the intentionresults of graphical display show high readability Table 9shows the readability ratings of 75 apps

53 Correctness Readability and Effectiveness

Correctness To evaluate the correctness we produce inten-tion descriptions for 75 malware sample apps (10 typical real-world samples and 65 DroidBench samples) using our behav-ior facts extraction algorithm and ontology inference engineFirst we use artificial methods to simulate automation toolsto extract the behavior facts in these samples The extractionmechanism is to construct the environment dependencegraph first and then use themethods in Section 43 tomap thesensitive APIs to behavior facts Second the extracted behav-ior facts are instantiated in the tools provided by protege347 and then the instances that are obviously wrong arefiltered Finally we use Jess inference engine for reasoning thebehavior facts and the SWRL rules are imported to inferenceengine at the same timeThe inference results give the outputof behavior sequence and final goal representation (see thedefinition of malware intention) finally the framework isshown in Figure 3

Results Table 8 presents the experimental results which showthat our inference system achieves a true positive rate of893 We must reiterate that the basis of our reasoning isthe sensitive API database we built can fully map sensitivebehaviors in malware Although it is not perfect it can be

12 Journal of Electrical and Computer Engineering

Table 9 Precision results for Appscan and intention reports

Judgment Appscan report Intention descriptiongeno Driod All geno Droid All

(1) Ineffective 5 0 5 1 2 3(2) Uncertain 1 65 66 1 4 5(3) Effective 4 0 4 8 59 67Total 10 65 75 10 65 75

fully covered in the 75 selected samples Inference systemmisses intention descriptions due to two major reasons(1) Artificial behavior facts extraction lacks accuracy Werely on our environment dependence graph and extractionalgorithm to perform simulation analysis However it is notprecise enough to handle the data flow propagation thatcannot be solved by existing static analysis technology (suchas exception handler code reflective calls) (2)The programlogic in real-world malware is more complex and withsemantic ambiguity and there are too many independentsensitive behaviors (in the form of user defined functions)This is not the result of the lack of theory and only technicalproblem

In fact static analysis in general lacks the capability ofextracting runtime behaviors and can be evaded accordinglyNevertheless we argue that our focus is to construct aframeworkwhich produces description and derivationmodelof malware intention

Readability and Effectiveness To evaluate the readability andeffectiveness of inference output (Intention) we perform auser study on our malware intention inference results Thegoal is twofold First we hope to knowwhether the generatedintentions are readable to average audience Second weexpect to see whether our intention descriptions can actuallyhelp users avoid risky apps

Methodology 360 microscopesrsquo Appscan platform [25] is anonline security scanning systemwhich focuses on the securityscan of Android apps The system will provide a scan reportafter the scanning and describe possible malicious intentof the app to users We uploaded all the samples to 360microscopesrsquo Appscan platform and got 75 security reportsof these samples We also collate ontology inference resultsand show them in the form of formalization and graphics75 samplesrsquo intention reports are produced We use both75 Appscan reports and 75 intention reports to measureuser reaction and the number of vulnerabilities providedby 360 microscopesrsquo Appscan platform is compared withthe intention description Furthermore due to the objectiveevaluation consideration we perform the user study basedon the official intents descriptions of 75 apps We use theseofficial descriptions as a reference standard

We have recruited participant directly from our lab-oratory group and we require participant that must besmartphone users We also make sure that participants haveexperiences with Android malware analysis and understandbasic smartphone events such as ldquointercept SMSrdquo or ldquoaccess

GPS locationrdquo We provide a rating for each sample report(Appscan and intention) with respect to its effectiveness Therating ranges from 1 to 3 where 1 means ineffective 2 meansuncertain and 3 means effective as shown in Table 9

Results and Implications Eventually the results are shownin Table 9 360 microscopesrsquo Appscan platform provides acoarser grained description and users can only judge the trueintents of 4 samples according to the Appscan report Whileusers can judge the malice of 67 samples using intentiondescription report This indicates our intention report isreadable even compared to Appscan report created by 360microscopesrsquo Appscan platform The results also reveal thatthe readability and granularity of intention description arerelatively important while Appscan-generated ones some-times confuse users 66 samples cannot be judged correctlythrough Appscan report In a further investigation we noticethat Appscan report provides a coarse intent descriptionof the sample while our security centered intention reportprovides a fairly fine-grained display of intention We believethat this can be further utilized during threat analysis

Threat Analysis Analysts are able to understand the relation-ships between malware behaviors (including trigger processof behaviors data transfer relation and the change in objectcharacteristics) according to the intention description Thisis of great practical significance for users to understand theimplementation mechanism of malware evaluate its possiblelosses and provide preventive solutions

6 Conclusion

In this paper a novel description and derivation model ofAndroid malware intention based on the theory of intentionand malware reverse engineering is proposed to restore theintention of malware and automate the process of intentionreasoning process In order to standardize the new concep-tual system and automate the intent derivation we createontology of malware intention As ontology is computablewe import SWRL rules represented in ontology languageand Fact Base to Jess engine and Jessrsquos outputs are thedescription of malware intention We evaluate our systemusing 75 typical malware samples (5 kinds) Experimentsshow that our inference system is capable of achieving thedesired results of intention description which proved itsfeasibility and effectiveness In addition by using existingreverse technology and data flow analysis tools we can extractbehavioral facts effectively

Journal of Electrical and Computer Engineering 13

Methods of mapping from Data Source to facts aredivided into two forms artificial extraction and automaticextraction We mainly use the methods of artificial factsextraction There are many opportunities for future workssuch as the implementation of automatic facts extractionmethods and the automatic visualization of the reasoningresults

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work was partly supported by Beijing Key Laboratory ofInternet Culture and Digital Dissemination Research Project(no ICDDXN001) National Key Technology Research andDevelopment Program of the Ministry of Science and Tech-nology of China (no 2015BAK12B03-03) Special Projectof Central Government to Guide the Development ofLocal Science and Technology (no Z171100004717002)and National Natural Science Foundation of China (nos61370065 61502040)

References

[1] P Guojun L Jingwen and S Runkang ldquoResearch and Progressof Android malware detectionrdquo Journal of Wuhan University(Science Edition) vol 61 no 1 pp 21ndash33 2015

[2] A Kharraz S Arshad C Mulliner et al ldquoA Large-ScaleAutomated Approach to Detectingrdquo in Proceedings of the 25thUSENIX Security Symposium (USENIX Security 16) pp 757ndash772 USENIX Association 2016

[3] M-Y Su and K-T Fung ldquoDetection of android malwareby static analysis on permissions and sensitive functionsrdquo inProceedings of the 8th International Conference on Ubiquitousand Future Networks ICUFN 2016 pp 873ndash875 Austria July2016

[4] S Arzt S Rasthofer C Fritz et al ldquoFlowDroid precise contextflow field object-sensitive and lifecycle-aware taint analysis forAndroid appsrdquoACM Sigplan Notices vol 49 no 6 pp 259ndash2692014

[5] E Bratman Michael ldquoWhat is intentionrdquo in Intentions incommunication P R Cohen Ed pp 15ndash32 theMITPress 1990

[6] J Shirley and D Evans ldquoThe user is not the enemy Fightingmalware by tracking user intentionsrdquo in Proceedings of the NewSecurity ParadigmsWorkshop 2008 NSPW rsquo08 pp 33ndash45 USASeptember 2008

[7] C-T Jung C-H Sun and M Yuan ldquoAn ontology-enabledframework for a geospatial problem-solving environmentrdquoComputers Environment and Urban Systems vol 38 no 1 pp45ndash57 2013

[8] I Horrocks F P Patel-Schneider H Boley et al SWRL ASemantic Web Rule Language Combining OWL and RuleMLW3C Member Submission 2004

[9] J Bak C Jedrzejek andM Falkowski ldquoUsage of the jess enginerules and ontology to query a relational databaserdquo LectureNotes in Computer Science (including subseries Lecture Notes

in Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 5858 pp 216ndash230 2009

[10] X-Y Du M Li and S Wang ldquoSurvey on ontology learningresearchrdquo Ruan Jian Xue BaoJournal of Software vol 17 no 9pp 1837ndash1847 2006

[11] S Cheng S Luo Z Li et al ldquoStatic Detection of DangerousBehaviors in Android Appsrdquo in Proceedings of the the 5thinternational symposium on cyberspace safety and securitySpringer International Publishing 2013

[12] IDA pro httpwwwhex-rayscomproductsida[13] J Jian and X Qing ldquoOn case auto generation using reverse

analysis for Android malwarerdquo Journal of HeFei University ofTechnology (Natural Science Edition) vol 39 no 4 pp 466ndash4702016

[14] W Yang X Xiao B Andow S Li T Xie and W EnckldquoAppContext differentiating malicious and benign mobile appbehaviors using contextrdquo in Proceedings of the 37th IEEEACMInternational Conference on Software Engineering (ICSE rsquo15) vol1 pp 303ndash313 IEEE May 2015

[15] Y Li T Shen X Sun et al ldquoDetection Classification andCharacterization of Android Malware Using API Data De-pendencyrdquo in Proceedings of the International Conference onSecurity and Privacy in Communication Systems vol 164 pp23ndash40 Springer International Publishing 2015

[16] M Zhang Y Duan H Yin and Z Zhao ldquoSemantics-awareAndroid malware classification using weighted contextual APIdependency graphsrdquo in Proceedings of the 21st ACM Conferenceon Computer and Communications Security (CCS rsquo14) pp 1105ndash1116 ACM Scottsdale Ariz USA November 2014

[17] P Wu H Changzhen and Y Shuping ldquoA dynamic intrusiveintention recognition method based on timed automatardquo Jour-nal of Computer Research and Development vol 48 no 7 pp1288ndash1297 2011

[18] J-H Han S-J Lee and J-H Kim ldquoBehavior Hierarchy-BasedAffordance Map for Recognition of Human Intention and ItsApplication to Human-Robot Interactionrdquo IEEE Transactionson Human-Machine Systems vol 46 no 5 pp 708ndash722 2016

[19] K Tam S J Khan A Fattori et al ldquoCopperDroid automaticreconstruction of android malware behaviorsrdquo in Proceedingsof the Network and Distributed System Security Symposium SanDiego Calif USA 2015

[20] M Zhang and H Yin ldquoAutomatic Generation of Security-Centric Descriptions for Android Appsrdquo inAndroid ApplicationSecurity Springer International Publishing 2016

[21] Z Wang C Li Z Yuan Y Guan and Y Xue ldquoDroidChain Anovel Android malware detection method based on behaviorchainsrdquo Pervasive andMobile Computing vol 32 pp 3ndash14 2016

[22] X Guozhi System science and Engineering Research Shanghaiscience and Technology Education Press 2001

[23] Y Zhou and X Jiang ldquoDissecting android malware char-acterization and evolutionrdquo in Proceedings of the 33rd IEEESymposium on Security and Privacy pp 95ndash109 IEEE 2012

[24] httpcopperdroidisgrhulacukcopperdroidindexphp[25] httpappscan360cn

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

8 Journal of Electrical and Computer Engineering

Table 3 Partial mappings between behavior and sensitive APIs

Behavior Behavior object sensitiveAPIright gain Root permission Runtimeexec()

access

Device ID getDeviceId()Carrier name getNetworkOperatorName()Phone position getCellLocation()Short Message createFromPdu()

send Number and contents sendTextMessage()intercept Broadcast information abortBroadcast()connect URL-parameter URLConnectionconnect()transmit Parameter execute()encrypt Parameter setEntity()store Parameter writeRec()

Act Obj Act

ActObj

Obj

Data source

Inference engine

SWRL rules

Corollaries7 and 8

Ontology

Inference output(intention description)

Malware

Facts base

1

2Intention

definition

Figure 3 The framework of intention inference system

Figure 4 Inference results of Zitmorsquos behavior relations

Broadcastis_monitored No monitor Broadcast

is_monitored Yes interceptBroadcast

is_ intercepted Yes

Beh_object SmsMessageposition inDevice access SmsMessage

position inMemory encrypt SmsMessageis_plain No

Input or Output URLis_used No connect

Behavior

URLis_used Yes transmit

SmsMessageposition remoteServer

Figure 5 Visualization of Zitmorsquos inference results

Journal of Electrical and Computer Engineering 9

Table 4 Behavior facts extracted from Zitmo

Behavior Code segment or function Behavior objectmonitor onReceive() Broadcastintercept abortBroadcast() Broadcastrsquo

accesscreateFrompdu()getMessageBody()

getOriginatingAddress()

BroadcastrsquoSmsMessage

connect new HttpPost(URL) URLencrypt setEntity() SmsMessage

transmit execute() URLrsquoSmsMessage

Table 5 The inference results of Zitmo

Behavior relationship Final goal representation

hasCompoundwith(monitor intercept) (Broadcast is monitored Yes)(Broadcast is intercepted Yes)

hasCompoundwith(monitor access) (SmsMessage positioninMemory)

hasCompoundwith(access encrypt) (SmsMessage is plain No)hasCombinationwith(connect encrypt) NohasCompoundwith(encrypt transmit) (URL is used Yes)

hasCompoundwith(connect transmit) (SmsMessage positionremoteServer)

facts In this section we evaluate the following aspectsFirst we verify whether the sensitive API library we buildcan cover the behavior facts that exist in malware samplesand give the coverage rate analysis Second to verifythe effectiveness of artificial extraction (semiautomated)behavior facts we compare our manual analysis resultswith the results generated by automated tool CopperDroid[19] Thirdly the effectiveness and reasoning performanceof the ontology inference system are shown Finally thecorrectness readability and effectiveness of the results ofintention reasoning are tackled

We have selected 75 typical malware samples (1 real-world ransomware sample and 9 real-world samples fromGenomersquos 1260 samples and 65 DroidBench samples) forevaluation which are shown in Table 6 Among these 10real-world samples 9 samples are collected from Androidmalware database established by Zhou and Jiang [23] inNorth Carolina State University We use these real-worldsamples because they are typical and representative of theirfamily Other samples in Genome are in same family sothe vulnerabilities and threats in the program are similarFor efficiency consideration we do not add the remainingsamples to experimental sample set DroidBench [4] samplesare designed to assess the correctness of static analyseson Android apps We use these samples as the groundtruths because they have open-sourced programs with clearsemantics [20] Thus we can make an adequate analysis ofthese samples In fact static analysis technique in generaloften lacks capability of extracting runtime behaviors and canbe evaded accordingly but it has incomparable advantagesover dynamic technology Nevertheless this paper focuses

on the research of software intention reduction that is theemphasis is to study the extraction and semantic mappingmethods of malwarersquos behavior the extraction of relationshipbetween behaviors and the representation of malwarersquos finalgoal

51 Effectiveness of Behavior Facts Extraction In generalwe have discovered 102 significant behaviors involving 115sensitive APIs via sensitive APImining inGenomersquos Androidapps and the summary of related literature [15 19ndash21] Wealso refer to Androidrsquos official API document and based onthe information collected a database of sensitiveAPI behavioris established These APIs are controlled by 30 Androidsensitive privileges and have detailed descriptions in officialdocuments

To evaluate the effectiveness and validity of behaviorfacts extracting we compare the description of our behaviorfacts extraction and CopperDroidrsquos [19] behavior descriptionCopperDroid is an automatic VMI-based dynamic analysissystem to reconstruct the behaviors of Android malware thenovelty of CopperDroid lies in its agnostic approach to iden-tify interesting OS- and high-level Android-specific behav-iors We perform a user study on CopperDroid platform andour extraction methods the goal is twofold First we give acomparative analysis on the behavior coverage rate of thesesamples Second we hope to know whether the behaviordescription generated by our methods is readable to averageaudience To this end we compare the behavior descriptionof our methods and CopperDroidrsquos Public Reports [24] Wehave collected 150 copies of CopperDroidrsquos public malwaresample analysis report and make a statistical analysis As

10 Journal of Electrical and Computer Engineering

Table6Malwares

amples

andtheirb

ehaviorinformation

Samplefam

ilyNum

ber

Ours(Cop

per)

Major

behavior

Intentioncla

ssificatio

nZitm

o1

6(2)

mon

itorinterceptaccesscon

nectencrypttransmit

Privacyste

aling

GoldD

ream

19(4)

mon

itoraccessstorecon

necttransmit

Privacyste

aling

DroidDream

15(3)

right

gainaccessconn

ecttransm

itPrivacyste

aling

DroidDelu

xe1

3(2)

mon

itorrig

htgainaccessconn

ecttransm

itPrivacyste

aling

Hippo

SMS

14(2)

send

mon

itoraccessintercept

Tariff

consum

ption

Geinimi

16(3)

remotecontrolsend

con

necttransmit

Tariff

consum

ption

RogueSPP

ush

16(2)

send

mon

itoraccessinterceptdelete

Malicious

chargeback

GGTracker

19(4)

accesscon

necttransmit

storeencryptm

onito

rsend

intercept

Malicious

chargeback

DroidKu

ngFu

-Upd

ate

14(2)

conn

ecttransm

itinsta

llmalpop

upMalwarep

ropagatio

nLo

vebu

ckleword

14(1)

popu

ptamperconn

ecttransm

itrig

htgain

Extortionuser

Aliasin

g1

2(2)

accesssend

Privacyleak

And

roidSpecific

93(2)

accesslogging

send

Privacyleak

ArraysA

ndLists

72(2)

accesssend

Privacyleak

Callb

acks

42(2)

accesssend

Privacyleak

EmulatorDetectio

n3

3(2)

accesslogging

send

Privacyleak

FieldA

ndObjectSensitivity

33(2)

accesslogging

send

Privacyleak

GeneralJava

143(2)

accesslogging

send

Privacyleak

ImplicitF

lows

42(2)

accesslogging

Privacyleak

InterA

ppCom

mun

ication

32(2)

accesssend

Privacyleak

Lifecycle

114(2)

accesscon

nectsendlogging

Privacyleak

Reflection

42(2)

accesssend

Privacyleak

Threading

23(2)

accesslogging

send

Privacyleak

Journal of Electrical and Computer Engineering 11

Table 7 Behavior semantics between ours and CopperDroid

Behavior class Objects Copper Ours Similar

Access personal infoSMS phoneaccountslocation

4 18 90

FS access Open Write 2 2 100

Network access Generic HTTPDNS 3 3 100

Send SMS SMS 1 1 100

Execute external file Generic Priv EscShell Inst APK 6 8 75

Encrypt info File info 0 2 0Intercept SMS SMS 0 2 0Store info Privacy info 0 1 0

Table 8 Intention generation results for samples

Sample set False status Missing desc Correct 119879Genome 1 1 8 10DroidBench 4 2 59 65Total 5 3 67 75

shown in Table 6 the third column (oursCopper) indicatesthe number of behaviorsrsquo extraction using our methodsand the number of behaviors extracted by CopperDroidfor the same sample In terms of the number of extractedbehaviors our extractionmechanism (87) is 776more thanCopperDroidrsquos behavior number (49)

We further study whether the behavior description gen-erated by our methods is readable to average audience Asbehavior facts extraction is the basis of this studywe thus pro-duce 75 behavior descriptions reports for these 75 samplesThe result illustrated in Table 7 depicts that the descriptionsderived from ours are richer than CopperDroid while theratio of behavior similarity ranges from75 to 100Copper-Droid failed to extract some of the behaviors we extract suchas encrypt intercept and store personal info Thus we arguethat the behaviors we extract can reflect the common pro-gram semantics and programming conventions effectively

Notice that though it may take artificial effort to generatebehavior facts description behavior extraction is a technol-ogy that has already been realized We refer to the work ofpredecessors extractingmalware behaviors though automaticextraction technology is not somatureThe focus of this workis to propose a framework for the description and deductionof behavioral intentions We argue that any analysis toolsboth static and dynamic can be utilized in our frameworkto achieve the described malware intention

52 Validity of Ontology Inference System Next we evaluatethe validity of ontology inference system Ontology inferenceexperiment reveals that the workload of manual writing codereduced significantly and Pellet engine reasoning verifiedthat the ontology is complete and consistent We evaluatethe runtime performance and ontology validity for 75 sam-ples preparation for behavior facts (add ontology instance)

dominates the runtime while the intention results generationis usually fairly fast (under 200 milliseconds) The averageinference runtime is 2 seconds while the analysis for amajority (85) of apps can be completed within 5 secondsThe validation of the reasoning results indicates that 893 ofthe samplersquos reasoning results are correct and the intentionresults of graphical display show high readability Table 9shows the readability ratings of 75 apps

53 Correctness Readability and Effectiveness

Correctness To evaluate the correctness we produce inten-tion descriptions for 75 malware sample apps (10 typical real-world samples and 65 DroidBench samples) using our behav-ior facts extraction algorithm and ontology inference engineFirst we use artificial methods to simulate automation toolsto extract the behavior facts in these samples The extractionmechanism is to construct the environment dependencegraph first and then use themethods in Section 43 tomap thesensitive APIs to behavior facts Second the extracted behav-ior facts are instantiated in the tools provided by protege347 and then the instances that are obviously wrong arefiltered Finally we use Jess inference engine for reasoning thebehavior facts and the SWRL rules are imported to inferenceengine at the same timeThe inference results give the outputof behavior sequence and final goal representation (see thedefinition of malware intention) finally the framework isshown in Figure 3

Results Table 8 presents the experimental results which showthat our inference system achieves a true positive rate of893 We must reiterate that the basis of our reasoning isthe sensitive API database we built can fully map sensitivebehaviors in malware Although it is not perfect it can be

12 Journal of Electrical and Computer Engineering

Table 9 Precision results for Appscan and intention reports

Judgment Appscan report Intention descriptiongeno Driod All geno Droid All

(1) Ineffective 5 0 5 1 2 3(2) Uncertain 1 65 66 1 4 5(3) Effective 4 0 4 8 59 67Total 10 65 75 10 65 75

fully covered in the 75 selected samples Inference systemmisses intention descriptions due to two major reasons(1) Artificial behavior facts extraction lacks accuracy Werely on our environment dependence graph and extractionalgorithm to perform simulation analysis However it is notprecise enough to handle the data flow propagation thatcannot be solved by existing static analysis technology (suchas exception handler code reflective calls) (2)The programlogic in real-world malware is more complex and withsemantic ambiguity and there are too many independentsensitive behaviors (in the form of user defined functions)This is not the result of the lack of theory and only technicalproblem

In fact static analysis in general lacks the capability ofextracting runtime behaviors and can be evaded accordinglyNevertheless we argue that our focus is to construct aframeworkwhich produces description and derivationmodelof malware intention

Readability and Effectiveness To evaluate the readability andeffectiveness of inference output (Intention) we perform auser study on our malware intention inference results Thegoal is twofold First we hope to knowwhether the generatedintentions are readable to average audience Second weexpect to see whether our intention descriptions can actuallyhelp users avoid risky apps

Methodology 360 microscopesrsquo Appscan platform [25] is anonline security scanning systemwhich focuses on the securityscan of Android apps The system will provide a scan reportafter the scanning and describe possible malicious intentof the app to users We uploaded all the samples to 360microscopesrsquo Appscan platform and got 75 security reportsof these samples We also collate ontology inference resultsand show them in the form of formalization and graphics75 samplesrsquo intention reports are produced We use both75 Appscan reports and 75 intention reports to measureuser reaction and the number of vulnerabilities providedby 360 microscopesrsquo Appscan platform is compared withthe intention description Furthermore due to the objectiveevaluation consideration we perform the user study basedon the official intents descriptions of 75 apps We use theseofficial descriptions as a reference standard

We have recruited participant directly from our lab-oratory group and we require participant that must besmartphone users We also make sure that participants haveexperiences with Android malware analysis and understandbasic smartphone events such as ldquointercept SMSrdquo or ldquoaccess

GPS locationrdquo We provide a rating for each sample report(Appscan and intention) with respect to its effectiveness Therating ranges from 1 to 3 where 1 means ineffective 2 meansuncertain and 3 means effective as shown in Table 9

Results and Implications Eventually the results are shownin Table 9 360 microscopesrsquo Appscan platform provides acoarser grained description and users can only judge the trueintents of 4 samples according to the Appscan report Whileusers can judge the malice of 67 samples using intentiondescription report This indicates our intention report isreadable even compared to Appscan report created by 360microscopesrsquo Appscan platform The results also reveal thatthe readability and granularity of intention description arerelatively important while Appscan-generated ones some-times confuse users 66 samples cannot be judged correctlythrough Appscan report In a further investigation we noticethat Appscan report provides a coarse intent descriptionof the sample while our security centered intention reportprovides a fairly fine-grained display of intention We believethat this can be further utilized during threat analysis

Threat Analysis Analysts are able to understand the relation-ships between malware behaviors (including trigger processof behaviors data transfer relation and the change in objectcharacteristics) according to the intention description Thisis of great practical significance for users to understand theimplementation mechanism of malware evaluate its possiblelosses and provide preventive solutions

6 Conclusion

In this paper a novel description and derivation model ofAndroid malware intention based on the theory of intentionand malware reverse engineering is proposed to restore theintention of malware and automate the process of intentionreasoning process In order to standardize the new concep-tual system and automate the intent derivation we createontology of malware intention As ontology is computablewe import SWRL rules represented in ontology languageand Fact Base to Jess engine and Jessrsquos outputs are thedescription of malware intention We evaluate our systemusing 75 typical malware samples (5 kinds) Experimentsshow that our inference system is capable of achieving thedesired results of intention description which proved itsfeasibility and effectiveness In addition by using existingreverse technology and data flow analysis tools we can extractbehavioral facts effectively

Journal of Electrical and Computer Engineering 13

Methods of mapping from Data Source to facts aredivided into two forms artificial extraction and automaticextraction We mainly use the methods of artificial factsextraction There are many opportunities for future workssuch as the implementation of automatic facts extractionmethods and the automatic visualization of the reasoningresults

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work was partly supported by Beijing Key Laboratory ofInternet Culture and Digital Dissemination Research Project(no ICDDXN001) National Key Technology Research andDevelopment Program of the Ministry of Science and Tech-nology of China (no 2015BAK12B03-03) Special Projectof Central Government to Guide the Development ofLocal Science and Technology (no Z171100004717002)and National Natural Science Foundation of China (nos61370065 61502040)

References

[1] P Guojun L Jingwen and S Runkang ldquoResearch and Progressof Android malware detectionrdquo Journal of Wuhan University(Science Edition) vol 61 no 1 pp 21ndash33 2015

[2] A Kharraz S Arshad C Mulliner et al ldquoA Large-ScaleAutomated Approach to Detectingrdquo in Proceedings of the 25thUSENIX Security Symposium (USENIX Security 16) pp 757ndash772 USENIX Association 2016

[3] M-Y Su and K-T Fung ldquoDetection of android malwareby static analysis on permissions and sensitive functionsrdquo inProceedings of the 8th International Conference on Ubiquitousand Future Networks ICUFN 2016 pp 873ndash875 Austria July2016

[4] S Arzt S Rasthofer C Fritz et al ldquoFlowDroid precise contextflow field object-sensitive and lifecycle-aware taint analysis forAndroid appsrdquoACM Sigplan Notices vol 49 no 6 pp 259ndash2692014

[5] E Bratman Michael ldquoWhat is intentionrdquo in Intentions incommunication P R Cohen Ed pp 15ndash32 theMITPress 1990

[6] J Shirley and D Evans ldquoThe user is not the enemy Fightingmalware by tracking user intentionsrdquo in Proceedings of the NewSecurity ParadigmsWorkshop 2008 NSPW rsquo08 pp 33ndash45 USASeptember 2008

[7] C-T Jung C-H Sun and M Yuan ldquoAn ontology-enabledframework for a geospatial problem-solving environmentrdquoComputers Environment and Urban Systems vol 38 no 1 pp45ndash57 2013

[8] I Horrocks F P Patel-Schneider H Boley et al SWRL ASemantic Web Rule Language Combining OWL and RuleMLW3C Member Submission 2004

[9] J Bak C Jedrzejek andM Falkowski ldquoUsage of the jess enginerules and ontology to query a relational databaserdquo LectureNotes in Computer Science (including subseries Lecture Notes

in Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 5858 pp 216ndash230 2009

[10] X-Y Du M Li and S Wang ldquoSurvey on ontology learningresearchrdquo Ruan Jian Xue BaoJournal of Software vol 17 no 9pp 1837ndash1847 2006

[11] S Cheng S Luo Z Li et al ldquoStatic Detection of DangerousBehaviors in Android Appsrdquo in Proceedings of the the 5thinternational symposium on cyberspace safety and securitySpringer International Publishing 2013

[12] IDA pro httpwwwhex-rayscomproductsida[13] J Jian and X Qing ldquoOn case auto generation using reverse

analysis for Android malwarerdquo Journal of HeFei University ofTechnology (Natural Science Edition) vol 39 no 4 pp 466ndash4702016

[14] W Yang X Xiao B Andow S Li T Xie and W EnckldquoAppContext differentiating malicious and benign mobile appbehaviors using contextrdquo in Proceedings of the 37th IEEEACMInternational Conference on Software Engineering (ICSE rsquo15) vol1 pp 303ndash313 IEEE May 2015

[15] Y Li T Shen X Sun et al ldquoDetection Classification andCharacterization of Android Malware Using API Data De-pendencyrdquo in Proceedings of the International Conference onSecurity and Privacy in Communication Systems vol 164 pp23ndash40 Springer International Publishing 2015

[16] M Zhang Y Duan H Yin and Z Zhao ldquoSemantics-awareAndroid malware classification using weighted contextual APIdependency graphsrdquo in Proceedings of the 21st ACM Conferenceon Computer and Communications Security (CCS rsquo14) pp 1105ndash1116 ACM Scottsdale Ariz USA November 2014

[17] P Wu H Changzhen and Y Shuping ldquoA dynamic intrusiveintention recognition method based on timed automatardquo Jour-nal of Computer Research and Development vol 48 no 7 pp1288ndash1297 2011

[18] J-H Han S-J Lee and J-H Kim ldquoBehavior Hierarchy-BasedAffordance Map for Recognition of Human Intention and ItsApplication to Human-Robot Interactionrdquo IEEE Transactionson Human-Machine Systems vol 46 no 5 pp 708ndash722 2016

[19] K Tam S J Khan A Fattori et al ldquoCopperDroid automaticreconstruction of android malware behaviorsrdquo in Proceedingsof the Network and Distributed System Security Symposium SanDiego Calif USA 2015

[20] M Zhang and H Yin ldquoAutomatic Generation of Security-Centric Descriptions for Android Appsrdquo inAndroid ApplicationSecurity Springer International Publishing 2016

[21] Z Wang C Li Z Yuan Y Guan and Y Xue ldquoDroidChain Anovel Android malware detection method based on behaviorchainsrdquo Pervasive andMobile Computing vol 32 pp 3ndash14 2016

[22] X Guozhi System science and Engineering Research Shanghaiscience and Technology Education Press 2001

[23] Y Zhou and X Jiang ldquoDissecting android malware char-acterization and evolutionrdquo in Proceedings of the 33rd IEEESymposium on Security and Privacy pp 95ndash109 IEEE 2012

[24] httpcopperdroidisgrhulacukcopperdroidindexphp[25] httpappscan360cn

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

Journal of Electrical and Computer Engineering 9

Table 4 Behavior facts extracted from Zitmo

Behavior Code segment or function Behavior objectmonitor onReceive() Broadcastintercept abortBroadcast() Broadcastrsquo

accesscreateFrompdu()getMessageBody()

getOriginatingAddress()

BroadcastrsquoSmsMessage

connect new HttpPost(URL) URLencrypt setEntity() SmsMessage

transmit execute() URLrsquoSmsMessage

Table 5 The inference results of Zitmo

Behavior relationship Final goal representation

hasCompoundwith(monitor intercept) (Broadcast is monitored Yes)(Broadcast is intercepted Yes)

hasCompoundwith(monitor access) (SmsMessage positioninMemory)

hasCompoundwith(access encrypt) (SmsMessage is plain No)hasCombinationwith(connect encrypt) NohasCompoundwith(encrypt transmit) (URL is used Yes)

hasCompoundwith(connect transmit) (SmsMessage positionremoteServer)

facts In this section we evaluate the following aspectsFirst we verify whether the sensitive API library we buildcan cover the behavior facts that exist in malware samplesand give the coverage rate analysis Second to verifythe effectiveness of artificial extraction (semiautomated)behavior facts we compare our manual analysis resultswith the results generated by automated tool CopperDroid[19] Thirdly the effectiveness and reasoning performanceof the ontology inference system are shown Finally thecorrectness readability and effectiveness of the results ofintention reasoning are tackled

We have selected 75 typical malware samples (1 real-world ransomware sample and 9 real-world samples fromGenomersquos 1260 samples and 65 DroidBench samples) forevaluation which are shown in Table 6 Among these 10real-world samples 9 samples are collected from Androidmalware database established by Zhou and Jiang [23] inNorth Carolina State University We use these real-worldsamples because they are typical and representative of theirfamily Other samples in Genome are in same family sothe vulnerabilities and threats in the program are similarFor efficiency consideration we do not add the remainingsamples to experimental sample set DroidBench [4] samplesare designed to assess the correctness of static analyseson Android apps We use these samples as the groundtruths because they have open-sourced programs with clearsemantics [20] Thus we can make an adequate analysis ofthese samples In fact static analysis technique in generaloften lacks capability of extracting runtime behaviors and canbe evaded accordingly but it has incomparable advantagesover dynamic technology Nevertheless this paper focuses

on the research of software intention reduction that is theemphasis is to study the extraction and semantic mappingmethods of malwarersquos behavior the extraction of relationshipbetween behaviors and the representation of malwarersquos finalgoal

51 Effectiveness of Behavior Facts Extraction In generalwe have discovered 102 significant behaviors involving 115sensitive APIs via sensitive APImining inGenomersquos Androidapps and the summary of related literature [15 19ndash21] Wealso refer to Androidrsquos official API document and based onthe information collected a database of sensitiveAPI behavioris established These APIs are controlled by 30 Androidsensitive privileges and have detailed descriptions in officialdocuments

To evaluate the effectiveness and validity of behaviorfacts extracting we compare the description of our behaviorfacts extraction and CopperDroidrsquos [19] behavior descriptionCopperDroid is an automatic VMI-based dynamic analysissystem to reconstruct the behaviors of Android malware thenovelty of CopperDroid lies in its agnostic approach to iden-tify interesting OS- and high-level Android-specific behav-iors We perform a user study on CopperDroid platform andour extraction methods the goal is twofold First we give acomparative analysis on the behavior coverage rate of thesesamples Second we hope to know whether the behaviordescription generated by our methods is readable to averageaudience To this end we compare the behavior descriptionof our methods and CopperDroidrsquos Public Reports [24] Wehave collected 150 copies of CopperDroidrsquos public malwaresample analysis report and make a statistical analysis As

10 Journal of Electrical and Computer Engineering

Table6Malwares

amples

andtheirb

ehaviorinformation

Samplefam

ilyNum

ber

Ours(Cop

per)

Major

behavior

Intentioncla

ssificatio

nZitm

o1

6(2)

mon

itorinterceptaccesscon

nectencrypttransmit

Privacyste

aling

GoldD

ream

19(4)

mon

itoraccessstorecon

necttransmit

Privacyste

aling

DroidDream

15(3)

right

gainaccessconn

ecttransm

itPrivacyste

aling

DroidDelu

xe1

3(2)

mon

itorrig

htgainaccessconn

ecttransm

itPrivacyste

aling

Hippo

SMS

14(2)

send

mon

itoraccessintercept

Tariff

consum

ption

Geinimi

16(3)

remotecontrolsend

con

necttransmit

Tariff

consum

ption

RogueSPP

ush

16(2)

send

mon

itoraccessinterceptdelete

Malicious

chargeback

GGTracker

19(4)

accesscon

necttransmit

storeencryptm

onito

rsend

intercept

Malicious

chargeback

DroidKu

ngFu

-Upd

ate

14(2)

conn

ecttransm

itinsta

llmalpop

upMalwarep

ropagatio

nLo

vebu

ckleword

14(1)

popu

ptamperconn

ecttransm

itrig

htgain

Extortionuser

Aliasin

g1

2(2)

accesssend

Privacyleak

And

roidSpecific

93(2)

accesslogging

send

Privacyleak

ArraysA

ndLists

72(2)

accesssend

Privacyleak

Callb

acks

42(2)

accesssend

Privacyleak

EmulatorDetectio

n3

3(2)

accesslogging

send

Privacyleak

FieldA

ndObjectSensitivity

33(2)

accesslogging

send

Privacyleak

GeneralJava

143(2)

accesslogging

send

Privacyleak

ImplicitF

lows

42(2)

accesslogging

Privacyleak

InterA

ppCom

mun

ication

32(2)

accesssend

Privacyleak

Lifecycle

114(2)

accesscon

nectsendlogging

Privacyleak

Reflection

42(2)

accesssend

Privacyleak

Threading

23(2)

accesslogging

send

Privacyleak

Journal of Electrical and Computer Engineering 11

Table 7 Behavior semantics between ours and CopperDroid

Behavior class Objects Copper Ours Similar

Access personal infoSMS phoneaccountslocation

4 18 90

FS access Open Write 2 2 100

Network access Generic HTTPDNS 3 3 100

Send SMS SMS 1 1 100

Execute external file Generic Priv EscShell Inst APK 6 8 75

Encrypt info File info 0 2 0Intercept SMS SMS 0 2 0Store info Privacy info 0 1 0

Table 8 Intention generation results for samples

Sample set False status Missing desc Correct 119879Genome 1 1 8 10DroidBench 4 2 59 65Total 5 3 67 75

shown in Table 6 the third column (oursCopper) indicatesthe number of behaviorsrsquo extraction using our methodsand the number of behaviors extracted by CopperDroidfor the same sample In terms of the number of extractedbehaviors our extractionmechanism (87) is 776more thanCopperDroidrsquos behavior number (49)

We further study whether the behavior description gen-erated by our methods is readable to average audience Asbehavior facts extraction is the basis of this studywe thus pro-duce 75 behavior descriptions reports for these 75 samplesThe result illustrated in Table 7 depicts that the descriptionsderived from ours are richer than CopperDroid while theratio of behavior similarity ranges from75 to 100Copper-Droid failed to extract some of the behaviors we extract suchas encrypt intercept and store personal info Thus we arguethat the behaviors we extract can reflect the common pro-gram semantics and programming conventions effectively

Notice that though it may take artificial effort to generatebehavior facts description behavior extraction is a technol-ogy that has already been realized We refer to the work ofpredecessors extractingmalware behaviors though automaticextraction technology is not somatureThe focus of this workis to propose a framework for the description and deductionof behavioral intentions We argue that any analysis toolsboth static and dynamic can be utilized in our frameworkto achieve the described malware intention

52 Validity of Ontology Inference System Next we evaluatethe validity of ontology inference system Ontology inferenceexperiment reveals that the workload of manual writing codereduced significantly and Pellet engine reasoning verifiedthat the ontology is complete and consistent We evaluatethe runtime performance and ontology validity for 75 sam-ples preparation for behavior facts (add ontology instance)

dominates the runtime while the intention results generationis usually fairly fast (under 200 milliseconds) The averageinference runtime is 2 seconds while the analysis for amajority (85) of apps can be completed within 5 secondsThe validation of the reasoning results indicates that 893 ofthe samplersquos reasoning results are correct and the intentionresults of graphical display show high readability Table 9shows the readability ratings of 75 apps

53 Correctness Readability and Effectiveness

Correctness To evaluate the correctness we produce inten-tion descriptions for 75 malware sample apps (10 typical real-world samples and 65 DroidBench samples) using our behav-ior facts extraction algorithm and ontology inference engineFirst we use artificial methods to simulate automation toolsto extract the behavior facts in these samples The extractionmechanism is to construct the environment dependencegraph first and then use themethods in Section 43 tomap thesensitive APIs to behavior facts Second the extracted behav-ior facts are instantiated in the tools provided by protege347 and then the instances that are obviously wrong arefiltered Finally we use Jess inference engine for reasoning thebehavior facts and the SWRL rules are imported to inferenceengine at the same timeThe inference results give the outputof behavior sequence and final goal representation (see thedefinition of malware intention) finally the framework isshown in Figure 3

Results Table 8 presents the experimental results which showthat our inference system achieves a true positive rate of893 We must reiterate that the basis of our reasoning isthe sensitive API database we built can fully map sensitivebehaviors in malware Although it is not perfect it can be

12 Journal of Electrical and Computer Engineering

Table 9 Precision results for Appscan and intention reports

Judgment Appscan report Intention descriptiongeno Driod All geno Droid All

(1) Ineffective 5 0 5 1 2 3(2) Uncertain 1 65 66 1 4 5(3) Effective 4 0 4 8 59 67Total 10 65 75 10 65 75

fully covered in the 75 selected samples Inference systemmisses intention descriptions due to two major reasons(1) Artificial behavior facts extraction lacks accuracy Werely on our environment dependence graph and extractionalgorithm to perform simulation analysis However it is notprecise enough to handle the data flow propagation thatcannot be solved by existing static analysis technology (suchas exception handler code reflective calls) (2)The programlogic in real-world malware is more complex and withsemantic ambiguity and there are too many independentsensitive behaviors (in the form of user defined functions)This is not the result of the lack of theory and only technicalproblem

In fact static analysis in general lacks the capability ofextracting runtime behaviors and can be evaded accordinglyNevertheless we argue that our focus is to construct aframeworkwhich produces description and derivationmodelof malware intention

Readability and Effectiveness To evaluate the readability andeffectiveness of inference output (Intention) we perform auser study on our malware intention inference results Thegoal is twofold First we hope to knowwhether the generatedintentions are readable to average audience Second weexpect to see whether our intention descriptions can actuallyhelp users avoid risky apps

Methodology 360 microscopesrsquo Appscan platform [25] is anonline security scanning systemwhich focuses on the securityscan of Android apps The system will provide a scan reportafter the scanning and describe possible malicious intentof the app to users We uploaded all the samples to 360microscopesrsquo Appscan platform and got 75 security reportsof these samples We also collate ontology inference resultsand show them in the form of formalization and graphics75 samplesrsquo intention reports are produced We use both75 Appscan reports and 75 intention reports to measureuser reaction and the number of vulnerabilities providedby 360 microscopesrsquo Appscan platform is compared withthe intention description Furthermore due to the objectiveevaluation consideration we perform the user study basedon the official intents descriptions of 75 apps We use theseofficial descriptions as a reference standard

We have recruited participant directly from our lab-oratory group and we require participant that must besmartphone users We also make sure that participants haveexperiences with Android malware analysis and understandbasic smartphone events such as ldquointercept SMSrdquo or ldquoaccess

GPS locationrdquo We provide a rating for each sample report(Appscan and intention) with respect to its effectiveness Therating ranges from 1 to 3 where 1 means ineffective 2 meansuncertain and 3 means effective as shown in Table 9

Results and Implications Eventually the results are shownin Table 9 360 microscopesrsquo Appscan platform provides acoarser grained description and users can only judge the trueintents of 4 samples according to the Appscan report Whileusers can judge the malice of 67 samples using intentiondescription report This indicates our intention report isreadable even compared to Appscan report created by 360microscopesrsquo Appscan platform The results also reveal thatthe readability and granularity of intention description arerelatively important while Appscan-generated ones some-times confuse users 66 samples cannot be judged correctlythrough Appscan report In a further investigation we noticethat Appscan report provides a coarse intent descriptionof the sample while our security centered intention reportprovides a fairly fine-grained display of intention We believethat this can be further utilized during threat analysis

Threat Analysis Analysts are able to understand the relation-ships between malware behaviors (including trigger processof behaviors data transfer relation and the change in objectcharacteristics) according to the intention description Thisis of great practical significance for users to understand theimplementation mechanism of malware evaluate its possiblelosses and provide preventive solutions

6 Conclusion

In this paper a novel description and derivation model ofAndroid malware intention based on the theory of intentionand malware reverse engineering is proposed to restore theintention of malware and automate the process of intentionreasoning process In order to standardize the new concep-tual system and automate the intent derivation we createontology of malware intention As ontology is computablewe import SWRL rules represented in ontology languageand Fact Base to Jess engine and Jessrsquos outputs are thedescription of malware intention We evaluate our systemusing 75 typical malware samples (5 kinds) Experimentsshow that our inference system is capable of achieving thedesired results of intention description which proved itsfeasibility and effectiveness In addition by using existingreverse technology and data flow analysis tools we can extractbehavioral facts effectively

Journal of Electrical and Computer Engineering 13

Methods of mapping from Data Source to facts aredivided into two forms artificial extraction and automaticextraction We mainly use the methods of artificial factsextraction There are many opportunities for future workssuch as the implementation of automatic facts extractionmethods and the automatic visualization of the reasoningresults

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work was partly supported by Beijing Key Laboratory ofInternet Culture and Digital Dissemination Research Project(no ICDDXN001) National Key Technology Research andDevelopment Program of the Ministry of Science and Tech-nology of China (no 2015BAK12B03-03) Special Projectof Central Government to Guide the Development ofLocal Science and Technology (no Z171100004717002)and National Natural Science Foundation of China (nos61370065 61502040)

References

[1] P Guojun L Jingwen and S Runkang ldquoResearch and Progressof Android malware detectionrdquo Journal of Wuhan University(Science Edition) vol 61 no 1 pp 21ndash33 2015

[2] A Kharraz S Arshad C Mulliner et al ldquoA Large-ScaleAutomated Approach to Detectingrdquo in Proceedings of the 25thUSENIX Security Symposium (USENIX Security 16) pp 757ndash772 USENIX Association 2016

[3] M-Y Su and K-T Fung ldquoDetection of android malwareby static analysis on permissions and sensitive functionsrdquo inProceedings of the 8th International Conference on Ubiquitousand Future Networks ICUFN 2016 pp 873ndash875 Austria July2016

[4] S Arzt S Rasthofer C Fritz et al ldquoFlowDroid precise contextflow field object-sensitive and lifecycle-aware taint analysis forAndroid appsrdquoACM Sigplan Notices vol 49 no 6 pp 259ndash2692014

[5] E Bratman Michael ldquoWhat is intentionrdquo in Intentions incommunication P R Cohen Ed pp 15ndash32 theMITPress 1990

[6] J Shirley and D Evans ldquoThe user is not the enemy Fightingmalware by tracking user intentionsrdquo in Proceedings of the NewSecurity ParadigmsWorkshop 2008 NSPW rsquo08 pp 33ndash45 USASeptember 2008

[7] C-T Jung C-H Sun and M Yuan ldquoAn ontology-enabledframework for a geospatial problem-solving environmentrdquoComputers Environment and Urban Systems vol 38 no 1 pp45ndash57 2013

[8] I Horrocks F P Patel-Schneider H Boley et al SWRL ASemantic Web Rule Language Combining OWL and RuleMLW3C Member Submission 2004

[9] J Bak C Jedrzejek andM Falkowski ldquoUsage of the jess enginerules and ontology to query a relational databaserdquo LectureNotes in Computer Science (including subseries Lecture Notes

in Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 5858 pp 216ndash230 2009

[10] X-Y Du M Li and S Wang ldquoSurvey on ontology learningresearchrdquo Ruan Jian Xue BaoJournal of Software vol 17 no 9pp 1837ndash1847 2006

[11] S Cheng S Luo Z Li et al ldquoStatic Detection of DangerousBehaviors in Android Appsrdquo in Proceedings of the the 5thinternational symposium on cyberspace safety and securitySpringer International Publishing 2013

[12] IDA pro httpwwwhex-rayscomproductsida[13] J Jian and X Qing ldquoOn case auto generation using reverse

analysis for Android malwarerdquo Journal of HeFei University ofTechnology (Natural Science Edition) vol 39 no 4 pp 466ndash4702016

[14] W Yang X Xiao B Andow S Li T Xie and W EnckldquoAppContext differentiating malicious and benign mobile appbehaviors using contextrdquo in Proceedings of the 37th IEEEACMInternational Conference on Software Engineering (ICSE rsquo15) vol1 pp 303ndash313 IEEE May 2015

[15] Y Li T Shen X Sun et al ldquoDetection Classification andCharacterization of Android Malware Using API Data De-pendencyrdquo in Proceedings of the International Conference onSecurity and Privacy in Communication Systems vol 164 pp23ndash40 Springer International Publishing 2015

[16] M Zhang Y Duan H Yin and Z Zhao ldquoSemantics-awareAndroid malware classification using weighted contextual APIdependency graphsrdquo in Proceedings of the 21st ACM Conferenceon Computer and Communications Security (CCS rsquo14) pp 1105ndash1116 ACM Scottsdale Ariz USA November 2014

[17] P Wu H Changzhen and Y Shuping ldquoA dynamic intrusiveintention recognition method based on timed automatardquo Jour-nal of Computer Research and Development vol 48 no 7 pp1288ndash1297 2011

[18] J-H Han S-J Lee and J-H Kim ldquoBehavior Hierarchy-BasedAffordance Map for Recognition of Human Intention and ItsApplication to Human-Robot Interactionrdquo IEEE Transactionson Human-Machine Systems vol 46 no 5 pp 708ndash722 2016

[19] K Tam S J Khan A Fattori et al ldquoCopperDroid automaticreconstruction of android malware behaviorsrdquo in Proceedingsof the Network and Distributed System Security Symposium SanDiego Calif USA 2015

[20] M Zhang and H Yin ldquoAutomatic Generation of Security-Centric Descriptions for Android Appsrdquo inAndroid ApplicationSecurity Springer International Publishing 2016

[21] Z Wang C Li Z Yuan Y Guan and Y Xue ldquoDroidChain Anovel Android malware detection method based on behaviorchainsrdquo Pervasive andMobile Computing vol 32 pp 3ndash14 2016

[22] X Guozhi System science and Engineering Research Shanghaiscience and Technology Education Press 2001

[23] Y Zhou and X Jiang ldquoDissecting android malware char-acterization and evolutionrdquo in Proceedings of the 33rd IEEESymposium on Security and Privacy pp 95ndash109 IEEE 2012

[24] httpcopperdroidisgrhulacukcopperdroidindexphp[25] httpappscan360cn

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

10 Journal of Electrical and Computer Engineering

Table6Malwares

amples

andtheirb

ehaviorinformation

Samplefam

ilyNum

ber

Ours(Cop

per)

Major

behavior

Intentioncla

ssificatio

nZitm

o1

6(2)

mon

itorinterceptaccesscon

nectencrypttransmit

Privacyste

aling

GoldD

ream

19(4)

mon

itoraccessstorecon

necttransmit

Privacyste

aling

DroidDream

15(3)

right

gainaccessconn

ecttransm

itPrivacyste

aling

DroidDelu

xe1

3(2)

mon

itorrig

htgainaccessconn

ecttransm

itPrivacyste

aling

Hippo

SMS

14(2)

send

mon

itoraccessintercept

Tariff

consum

ption

Geinimi

16(3)

remotecontrolsend

con

necttransmit

Tariff

consum

ption

RogueSPP

ush

16(2)

send

mon

itoraccessinterceptdelete

Malicious

chargeback

GGTracker

19(4)

accesscon

necttransmit

storeencryptm

onito

rsend

intercept

Malicious

chargeback

DroidKu

ngFu

-Upd

ate

14(2)

conn

ecttransm

itinsta

llmalpop

upMalwarep

ropagatio

nLo

vebu

ckleword

14(1)

popu

ptamperconn

ecttransm

itrig

htgain

Extortionuser

Aliasin

g1

2(2)

accesssend

Privacyleak

And

roidSpecific

93(2)

accesslogging

send

Privacyleak

ArraysA

ndLists

72(2)

accesssend

Privacyleak

Callb

acks

42(2)

accesssend

Privacyleak

EmulatorDetectio

n3

3(2)

accesslogging

send

Privacyleak

FieldA

ndObjectSensitivity

33(2)

accesslogging

send

Privacyleak

GeneralJava

143(2)

accesslogging

send

Privacyleak

ImplicitF

lows

42(2)

accesslogging

Privacyleak

InterA

ppCom

mun

ication

32(2)

accesssend

Privacyleak

Lifecycle

114(2)

accesscon

nectsendlogging

Privacyleak

Reflection

42(2)

accesssend

Privacyleak

Threading

23(2)

accesslogging

send

Privacyleak

Journal of Electrical and Computer Engineering 11

Table 7 Behavior semantics between ours and CopperDroid

Behavior class Objects Copper Ours Similar

Access personal infoSMS phoneaccountslocation

4 18 90

FS access Open Write 2 2 100

Network access Generic HTTPDNS 3 3 100

Send SMS SMS 1 1 100

Execute external file Generic Priv EscShell Inst APK 6 8 75

Encrypt info File info 0 2 0Intercept SMS SMS 0 2 0Store info Privacy info 0 1 0

Table 8 Intention generation results for samples

Sample set False status Missing desc Correct 119879Genome 1 1 8 10DroidBench 4 2 59 65Total 5 3 67 75

shown in Table 6 the third column (oursCopper) indicatesthe number of behaviorsrsquo extraction using our methodsand the number of behaviors extracted by CopperDroidfor the same sample In terms of the number of extractedbehaviors our extractionmechanism (87) is 776more thanCopperDroidrsquos behavior number (49)

We further study whether the behavior description gen-erated by our methods is readable to average audience Asbehavior facts extraction is the basis of this studywe thus pro-duce 75 behavior descriptions reports for these 75 samplesThe result illustrated in Table 7 depicts that the descriptionsderived from ours are richer than CopperDroid while theratio of behavior similarity ranges from75 to 100Copper-Droid failed to extract some of the behaviors we extract suchas encrypt intercept and store personal info Thus we arguethat the behaviors we extract can reflect the common pro-gram semantics and programming conventions effectively

Notice that though it may take artificial effort to generatebehavior facts description behavior extraction is a technol-ogy that has already been realized We refer to the work ofpredecessors extractingmalware behaviors though automaticextraction technology is not somatureThe focus of this workis to propose a framework for the description and deductionof behavioral intentions We argue that any analysis toolsboth static and dynamic can be utilized in our frameworkto achieve the described malware intention

52 Validity of Ontology Inference System Next we evaluatethe validity of ontology inference system Ontology inferenceexperiment reveals that the workload of manual writing codereduced significantly and Pellet engine reasoning verifiedthat the ontology is complete and consistent We evaluatethe runtime performance and ontology validity for 75 sam-ples preparation for behavior facts (add ontology instance)

dominates the runtime while the intention results generationis usually fairly fast (under 200 milliseconds) The averageinference runtime is 2 seconds while the analysis for amajority (85) of apps can be completed within 5 secondsThe validation of the reasoning results indicates that 893 ofthe samplersquos reasoning results are correct and the intentionresults of graphical display show high readability Table 9shows the readability ratings of 75 apps

53 Correctness Readability and Effectiveness

Correctness To evaluate the correctness we produce inten-tion descriptions for 75 malware sample apps (10 typical real-world samples and 65 DroidBench samples) using our behav-ior facts extraction algorithm and ontology inference engineFirst we use artificial methods to simulate automation toolsto extract the behavior facts in these samples The extractionmechanism is to construct the environment dependencegraph first and then use themethods in Section 43 tomap thesensitive APIs to behavior facts Second the extracted behav-ior facts are instantiated in the tools provided by protege347 and then the instances that are obviously wrong arefiltered Finally we use Jess inference engine for reasoning thebehavior facts and the SWRL rules are imported to inferenceengine at the same timeThe inference results give the outputof behavior sequence and final goal representation (see thedefinition of malware intention) finally the framework isshown in Figure 3

Results Table 8 presents the experimental results which showthat our inference system achieves a true positive rate of893 We must reiterate that the basis of our reasoning isthe sensitive API database we built can fully map sensitivebehaviors in malware Although it is not perfect it can be

12 Journal of Electrical and Computer Engineering

Table 9 Precision results for Appscan and intention reports

Judgment Appscan report Intention descriptiongeno Driod All geno Droid All

(1) Ineffective 5 0 5 1 2 3(2) Uncertain 1 65 66 1 4 5(3) Effective 4 0 4 8 59 67Total 10 65 75 10 65 75

fully covered in the 75 selected samples Inference systemmisses intention descriptions due to two major reasons(1) Artificial behavior facts extraction lacks accuracy Werely on our environment dependence graph and extractionalgorithm to perform simulation analysis However it is notprecise enough to handle the data flow propagation thatcannot be solved by existing static analysis technology (suchas exception handler code reflective calls) (2)The programlogic in real-world malware is more complex and withsemantic ambiguity and there are too many independentsensitive behaviors (in the form of user defined functions)This is not the result of the lack of theory and only technicalproblem

In fact static analysis in general lacks the capability ofextracting runtime behaviors and can be evaded accordinglyNevertheless we argue that our focus is to construct aframeworkwhich produces description and derivationmodelof malware intention

Readability and Effectiveness To evaluate the readability andeffectiveness of inference output (Intention) we perform auser study on our malware intention inference results Thegoal is twofold First we hope to knowwhether the generatedintentions are readable to average audience Second weexpect to see whether our intention descriptions can actuallyhelp users avoid risky apps

Methodology 360 microscopesrsquo Appscan platform [25] is anonline security scanning systemwhich focuses on the securityscan of Android apps The system will provide a scan reportafter the scanning and describe possible malicious intentof the app to users We uploaded all the samples to 360microscopesrsquo Appscan platform and got 75 security reportsof these samples We also collate ontology inference resultsand show them in the form of formalization and graphics75 samplesrsquo intention reports are produced We use both75 Appscan reports and 75 intention reports to measureuser reaction and the number of vulnerabilities providedby 360 microscopesrsquo Appscan platform is compared withthe intention description Furthermore due to the objectiveevaluation consideration we perform the user study basedon the official intents descriptions of 75 apps We use theseofficial descriptions as a reference standard

We have recruited participant directly from our lab-oratory group and we require participant that must besmartphone users We also make sure that participants haveexperiences with Android malware analysis and understandbasic smartphone events such as ldquointercept SMSrdquo or ldquoaccess

GPS locationrdquo We provide a rating for each sample report(Appscan and intention) with respect to its effectiveness Therating ranges from 1 to 3 where 1 means ineffective 2 meansuncertain and 3 means effective as shown in Table 9

Results and Implications Eventually the results are shownin Table 9 360 microscopesrsquo Appscan platform provides acoarser grained description and users can only judge the trueintents of 4 samples according to the Appscan report Whileusers can judge the malice of 67 samples using intentiondescription report This indicates our intention report isreadable even compared to Appscan report created by 360microscopesrsquo Appscan platform The results also reveal thatthe readability and granularity of intention description arerelatively important while Appscan-generated ones some-times confuse users 66 samples cannot be judged correctlythrough Appscan report In a further investigation we noticethat Appscan report provides a coarse intent descriptionof the sample while our security centered intention reportprovides a fairly fine-grained display of intention We believethat this can be further utilized during threat analysis

Threat Analysis Analysts are able to understand the relation-ships between malware behaviors (including trigger processof behaviors data transfer relation and the change in objectcharacteristics) according to the intention description Thisis of great practical significance for users to understand theimplementation mechanism of malware evaluate its possiblelosses and provide preventive solutions

6 Conclusion

In this paper a novel description and derivation model ofAndroid malware intention based on the theory of intentionand malware reverse engineering is proposed to restore theintention of malware and automate the process of intentionreasoning process In order to standardize the new concep-tual system and automate the intent derivation we createontology of malware intention As ontology is computablewe import SWRL rules represented in ontology languageand Fact Base to Jess engine and Jessrsquos outputs are thedescription of malware intention We evaluate our systemusing 75 typical malware samples (5 kinds) Experimentsshow that our inference system is capable of achieving thedesired results of intention description which proved itsfeasibility and effectiveness In addition by using existingreverse technology and data flow analysis tools we can extractbehavioral facts effectively

Journal of Electrical and Computer Engineering 13

Methods of mapping from Data Source to facts aredivided into two forms artificial extraction and automaticextraction We mainly use the methods of artificial factsextraction There are many opportunities for future workssuch as the implementation of automatic facts extractionmethods and the automatic visualization of the reasoningresults

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work was partly supported by Beijing Key Laboratory ofInternet Culture and Digital Dissemination Research Project(no ICDDXN001) National Key Technology Research andDevelopment Program of the Ministry of Science and Tech-nology of China (no 2015BAK12B03-03) Special Projectof Central Government to Guide the Development ofLocal Science and Technology (no Z171100004717002)and National Natural Science Foundation of China (nos61370065 61502040)

References

[1] P Guojun L Jingwen and S Runkang ldquoResearch and Progressof Android malware detectionrdquo Journal of Wuhan University(Science Edition) vol 61 no 1 pp 21ndash33 2015

[2] A Kharraz S Arshad C Mulliner et al ldquoA Large-ScaleAutomated Approach to Detectingrdquo in Proceedings of the 25thUSENIX Security Symposium (USENIX Security 16) pp 757ndash772 USENIX Association 2016

[3] M-Y Su and K-T Fung ldquoDetection of android malwareby static analysis on permissions and sensitive functionsrdquo inProceedings of the 8th International Conference on Ubiquitousand Future Networks ICUFN 2016 pp 873ndash875 Austria July2016

[4] S Arzt S Rasthofer C Fritz et al ldquoFlowDroid precise contextflow field object-sensitive and lifecycle-aware taint analysis forAndroid appsrdquoACM Sigplan Notices vol 49 no 6 pp 259ndash2692014

[5] E Bratman Michael ldquoWhat is intentionrdquo in Intentions incommunication P R Cohen Ed pp 15ndash32 theMITPress 1990

[6] J Shirley and D Evans ldquoThe user is not the enemy Fightingmalware by tracking user intentionsrdquo in Proceedings of the NewSecurity ParadigmsWorkshop 2008 NSPW rsquo08 pp 33ndash45 USASeptember 2008

[7] C-T Jung C-H Sun and M Yuan ldquoAn ontology-enabledframework for a geospatial problem-solving environmentrdquoComputers Environment and Urban Systems vol 38 no 1 pp45ndash57 2013

[8] I Horrocks F P Patel-Schneider H Boley et al SWRL ASemantic Web Rule Language Combining OWL and RuleMLW3C Member Submission 2004

[9] J Bak C Jedrzejek andM Falkowski ldquoUsage of the jess enginerules and ontology to query a relational databaserdquo LectureNotes in Computer Science (including subseries Lecture Notes

in Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 5858 pp 216ndash230 2009

[10] X-Y Du M Li and S Wang ldquoSurvey on ontology learningresearchrdquo Ruan Jian Xue BaoJournal of Software vol 17 no 9pp 1837ndash1847 2006

[11] S Cheng S Luo Z Li et al ldquoStatic Detection of DangerousBehaviors in Android Appsrdquo in Proceedings of the the 5thinternational symposium on cyberspace safety and securitySpringer International Publishing 2013

[12] IDA pro httpwwwhex-rayscomproductsida[13] J Jian and X Qing ldquoOn case auto generation using reverse

analysis for Android malwarerdquo Journal of HeFei University ofTechnology (Natural Science Edition) vol 39 no 4 pp 466ndash4702016

[14] W Yang X Xiao B Andow S Li T Xie and W EnckldquoAppContext differentiating malicious and benign mobile appbehaviors using contextrdquo in Proceedings of the 37th IEEEACMInternational Conference on Software Engineering (ICSE rsquo15) vol1 pp 303ndash313 IEEE May 2015

[15] Y Li T Shen X Sun et al ldquoDetection Classification andCharacterization of Android Malware Using API Data De-pendencyrdquo in Proceedings of the International Conference onSecurity and Privacy in Communication Systems vol 164 pp23ndash40 Springer International Publishing 2015

[16] M Zhang Y Duan H Yin and Z Zhao ldquoSemantics-awareAndroid malware classification using weighted contextual APIdependency graphsrdquo in Proceedings of the 21st ACM Conferenceon Computer and Communications Security (CCS rsquo14) pp 1105ndash1116 ACM Scottsdale Ariz USA November 2014

[17] P Wu H Changzhen and Y Shuping ldquoA dynamic intrusiveintention recognition method based on timed automatardquo Jour-nal of Computer Research and Development vol 48 no 7 pp1288ndash1297 2011

[18] J-H Han S-J Lee and J-H Kim ldquoBehavior Hierarchy-BasedAffordance Map for Recognition of Human Intention and ItsApplication to Human-Robot Interactionrdquo IEEE Transactionson Human-Machine Systems vol 46 no 5 pp 708ndash722 2016

[19] K Tam S J Khan A Fattori et al ldquoCopperDroid automaticreconstruction of android malware behaviorsrdquo in Proceedingsof the Network and Distributed System Security Symposium SanDiego Calif USA 2015

[20] M Zhang and H Yin ldquoAutomatic Generation of Security-Centric Descriptions for Android Appsrdquo inAndroid ApplicationSecurity Springer International Publishing 2016

[21] Z Wang C Li Z Yuan Y Guan and Y Xue ldquoDroidChain Anovel Android malware detection method based on behaviorchainsrdquo Pervasive andMobile Computing vol 32 pp 3ndash14 2016

[22] X Guozhi System science and Engineering Research Shanghaiscience and Technology Education Press 2001

[23] Y Zhou and X Jiang ldquoDissecting android malware char-acterization and evolutionrdquo in Proceedings of the 33rd IEEESymposium on Security and Privacy pp 95ndash109 IEEE 2012

[24] httpcopperdroidisgrhulacukcopperdroidindexphp[25] httpappscan360cn

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

Journal of Electrical and Computer Engineering 11

Table 7 Behavior semantics between ours and CopperDroid

Behavior class Objects Copper Ours Similar

Access personal infoSMS phoneaccountslocation

4 18 90

FS access Open Write 2 2 100

Network access Generic HTTPDNS 3 3 100

Send SMS SMS 1 1 100

Execute external file Generic Priv EscShell Inst APK 6 8 75

Encrypt info File info 0 2 0Intercept SMS SMS 0 2 0Store info Privacy info 0 1 0

Table 8 Intention generation results for samples

Sample set False status Missing desc Correct 119879Genome 1 1 8 10DroidBench 4 2 59 65Total 5 3 67 75

shown in Table 6 the third column (oursCopper) indicatesthe number of behaviorsrsquo extraction using our methodsand the number of behaviors extracted by CopperDroidfor the same sample In terms of the number of extractedbehaviors our extractionmechanism (87) is 776more thanCopperDroidrsquos behavior number (49)

We further study whether the behavior description gen-erated by our methods is readable to average audience Asbehavior facts extraction is the basis of this studywe thus pro-duce 75 behavior descriptions reports for these 75 samplesThe result illustrated in Table 7 depicts that the descriptionsderived from ours are richer than CopperDroid while theratio of behavior similarity ranges from75 to 100Copper-Droid failed to extract some of the behaviors we extract suchas encrypt intercept and store personal info Thus we arguethat the behaviors we extract can reflect the common pro-gram semantics and programming conventions effectively

Notice that though it may take artificial effort to generatebehavior facts description behavior extraction is a technol-ogy that has already been realized We refer to the work ofpredecessors extractingmalware behaviors though automaticextraction technology is not somatureThe focus of this workis to propose a framework for the description and deductionof behavioral intentions We argue that any analysis toolsboth static and dynamic can be utilized in our frameworkto achieve the described malware intention

52 Validity of Ontology Inference System Next we evaluatethe validity of ontology inference system Ontology inferenceexperiment reveals that the workload of manual writing codereduced significantly and Pellet engine reasoning verifiedthat the ontology is complete and consistent We evaluatethe runtime performance and ontology validity for 75 sam-ples preparation for behavior facts (add ontology instance)

dominates the runtime while the intention results generationis usually fairly fast (under 200 milliseconds) The averageinference runtime is 2 seconds while the analysis for amajority (85) of apps can be completed within 5 secondsThe validation of the reasoning results indicates that 893 ofthe samplersquos reasoning results are correct and the intentionresults of graphical display show high readability Table 9shows the readability ratings of 75 apps

53 Correctness Readability and Effectiveness

Correctness To evaluate the correctness we produce inten-tion descriptions for 75 malware sample apps (10 typical real-world samples and 65 DroidBench samples) using our behav-ior facts extraction algorithm and ontology inference engineFirst we use artificial methods to simulate automation toolsto extract the behavior facts in these samples The extractionmechanism is to construct the environment dependencegraph first and then use themethods in Section 43 tomap thesensitive APIs to behavior facts Second the extracted behav-ior facts are instantiated in the tools provided by protege347 and then the instances that are obviously wrong arefiltered Finally we use Jess inference engine for reasoning thebehavior facts and the SWRL rules are imported to inferenceengine at the same timeThe inference results give the outputof behavior sequence and final goal representation (see thedefinition of malware intention) finally the framework isshown in Figure 3

Results Table 8 presents the experimental results which showthat our inference system achieves a true positive rate of893 We must reiterate that the basis of our reasoning isthe sensitive API database we built can fully map sensitivebehaviors in malware Although it is not perfect it can be

12 Journal of Electrical and Computer Engineering

Table 9 Precision results for Appscan and intention reports

Judgment Appscan report Intention descriptiongeno Driod All geno Droid All

(1) Ineffective 5 0 5 1 2 3(2) Uncertain 1 65 66 1 4 5(3) Effective 4 0 4 8 59 67Total 10 65 75 10 65 75

fully covered in the 75 selected samples Inference systemmisses intention descriptions due to two major reasons(1) Artificial behavior facts extraction lacks accuracy Werely on our environment dependence graph and extractionalgorithm to perform simulation analysis However it is notprecise enough to handle the data flow propagation thatcannot be solved by existing static analysis technology (suchas exception handler code reflective calls) (2)The programlogic in real-world malware is more complex and withsemantic ambiguity and there are too many independentsensitive behaviors (in the form of user defined functions)This is not the result of the lack of theory and only technicalproblem

In fact static analysis in general lacks the capability ofextracting runtime behaviors and can be evaded accordinglyNevertheless we argue that our focus is to construct aframeworkwhich produces description and derivationmodelof malware intention

Readability and Effectiveness To evaluate the readability andeffectiveness of inference output (Intention) we perform auser study on our malware intention inference results Thegoal is twofold First we hope to knowwhether the generatedintentions are readable to average audience Second weexpect to see whether our intention descriptions can actuallyhelp users avoid risky apps

Methodology 360 microscopesrsquo Appscan platform [25] is anonline security scanning systemwhich focuses on the securityscan of Android apps The system will provide a scan reportafter the scanning and describe possible malicious intentof the app to users We uploaded all the samples to 360microscopesrsquo Appscan platform and got 75 security reportsof these samples We also collate ontology inference resultsand show them in the form of formalization and graphics75 samplesrsquo intention reports are produced We use both75 Appscan reports and 75 intention reports to measureuser reaction and the number of vulnerabilities providedby 360 microscopesrsquo Appscan platform is compared withthe intention description Furthermore due to the objectiveevaluation consideration we perform the user study basedon the official intents descriptions of 75 apps We use theseofficial descriptions as a reference standard

We have recruited participant directly from our lab-oratory group and we require participant that must besmartphone users We also make sure that participants haveexperiences with Android malware analysis and understandbasic smartphone events such as ldquointercept SMSrdquo or ldquoaccess

GPS locationrdquo We provide a rating for each sample report(Appscan and intention) with respect to its effectiveness Therating ranges from 1 to 3 where 1 means ineffective 2 meansuncertain and 3 means effective as shown in Table 9

Results and Implications Eventually the results are shownin Table 9 360 microscopesrsquo Appscan platform provides acoarser grained description and users can only judge the trueintents of 4 samples according to the Appscan report Whileusers can judge the malice of 67 samples using intentiondescription report This indicates our intention report isreadable even compared to Appscan report created by 360microscopesrsquo Appscan platform The results also reveal thatthe readability and granularity of intention description arerelatively important while Appscan-generated ones some-times confuse users 66 samples cannot be judged correctlythrough Appscan report In a further investigation we noticethat Appscan report provides a coarse intent descriptionof the sample while our security centered intention reportprovides a fairly fine-grained display of intention We believethat this can be further utilized during threat analysis

Threat Analysis Analysts are able to understand the relation-ships between malware behaviors (including trigger processof behaviors data transfer relation and the change in objectcharacteristics) according to the intention description Thisis of great practical significance for users to understand theimplementation mechanism of malware evaluate its possiblelosses and provide preventive solutions

6 Conclusion

In this paper a novel description and derivation model ofAndroid malware intention based on the theory of intentionand malware reverse engineering is proposed to restore theintention of malware and automate the process of intentionreasoning process In order to standardize the new concep-tual system and automate the intent derivation we createontology of malware intention As ontology is computablewe import SWRL rules represented in ontology languageand Fact Base to Jess engine and Jessrsquos outputs are thedescription of malware intention We evaluate our systemusing 75 typical malware samples (5 kinds) Experimentsshow that our inference system is capable of achieving thedesired results of intention description which proved itsfeasibility and effectiveness In addition by using existingreverse technology and data flow analysis tools we can extractbehavioral facts effectively

Journal of Electrical and Computer Engineering 13

Methods of mapping from Data Source to facts aredivided into two forms artificial extraction and automaticextraction We mainly use the methods of artificial factsextraction There are many opportunities for future workssuch as the implementation of automatic facts extractionmethods and the automatic visualization of the reasoningresults

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work was partly supported by Beijing Key Laboratory ofInternet Culture and Digital Dissemination Research Project(no ICDDXN001) National Key Technology Research andDevelopment Program of the Ministry of Science and Tech-nology of China (no 2015BAK12B03-03) Special Projectof Central Government to Guide the Development ofLocal Science and Technology (no Z171100004717002)and National Natural Science Foundation of China (nos61370065 61502040)

References

[1] P Guojun L Jingwen and S Runkang ldquoResearch and Progressof Android malware detectionrdquo Journal of Wuhan University(Science Edition) vol 61 no 1 pp 21ndash33 2015

[2] A Kharraz S Arshad C Mulliner et al ldquoA Large-ScaleAutomated Approach to Detectingrdquo in Proceedings of the 25thUSENIX Security Symposium (USENIX Security 16) pp 757ndash772 USENIX Association 2016

[3] M-Y Su and K-T Fung ldquoDetection of android malwareby static analysis on permissions and sensitive functionsrdquo inProceedings of the 8th International Conference on Ubiquitousand Future Networks ICUFN 2016 pp 873ndash875 Austria July2016

[4] S Arzt S Rasthofer C Fritz et al ldquoFlowDroid precise contextflow field object-sensitive and lifecycle-aware taint analysis forAndroid appsrdquoACM Sigplan Notices vol 49 no 6 pp 259ndash2692014

[5] E Bratman Michael ldquoWhat is intentionrdquo in Intentions incommunication P R Cohen Ed pp 15ndash32 theMITPress 1990

[6] J Shirley and D Evans ldquoThe user is not the enemy Fightingmalware by tracking user intentionsrdquo in Proceedings of the NewSecurity ParadigmsWorkshop 2008 NSPW rsquo08 pp 33ndash45 USASeptember 2008

[7] C-T Jung C-H Sun and M Yuan ldquoAn ontology-enabledframework for a geospatial problem-solving environmentrdquoComputers Environment and Urban Systems vol 38 no 1 pp45ndash57 2013

[8] I Horrocks F P Patel-Schneider H Boley et al SWRL ASemantic Web Rule Language Combining OWL and RuleMLW3C Member Submission 2004

[9] J Bak C Jedrzejek andM Falkowski ldquoUsage of the jess enginerules and ontology to query a relational databaserdquo LectureNotes in Computer Science (including subseries Lecture Notes

in Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 5858 pp 216ndash230 2009

[10] X-Y Du M Li and S Wang ldquoSurvey on ontology learningresearchrdquo Ruan Jian Xue BaoJournal of Software vol 17 no 9pp 1837ndash1847 2006

[11] S Cheng S Luo Z Li et al ldquoStatic Detection of DangerousBehaviors in Android Appsrdquo in Proceedings of the the 5thinternational symposium on cyberspace safety and securitySpringer International Publishing 2013

[12] IDA pro httpwwwhex-rayscomproductsida[13] J Jian and X Qing ldquoOn case auto generation using reverse

analysis for Android malwarerdquo Journal of HeFei University ofTechnology (Natural Science Edition) vol 39 no 4 pp 466ndash4702016

[14] W Yang X Xiao B Andow S Li T Xie and W EnckldquoAppContext differentiating malicious and benign mobile appbehaviors using contextrdquo in Proceedings of the 37th IEEEACMInternational Conference on Software Engineering (ICSE rsquo15) vol1 pp 303ndash313 IEEE May 2015

[15] Y Li T Shen X Sun et al ldquoDetection Classification andCharacterization of Android Malware Using API Data De-pendencyrdquo in Proceedings of the International Conference onSecurity and Privacy in Communication Systems vol 164 pp23ndash40 Springer International Publishing 2015

[16] M Zhang Y Duan H Yin and Z Zhao ldquoSemantics-awareAndroid malware classification using weighted contextual APIdependency graphsrdquo in Proceedings of the 21st ACM Conferenceon Computer and Communications Security (CCS rsquo14) pp 1105ndash1116 ACM Scottsdale Ariz USA November 2014

[17] P Wu H Changzhen and Y Shuping ldquoA dynamic intrusiveintention recognition method based on timed automatardquo Jour-nal of Computer Research and Development vol 48 no 7 pp1288ndash1297 2011

[18] J-H Han S-J Lee and J-H Kim ldquoBehavior Hierarchy-BasedAffordance Map for Recognition of Human Intention and ItsApplication to Human-Robot Interactionrdquo IEEE Transactionson Human-Machine Systems vol 46 no 5 pp 708ndash722 2016

[19] K Tam S J Khan A Fattori et al ldquoCopperDroid automaticreconstruction of android malware behaviorsrdquo in Proceedingsof the Network and Distributed System Security Symposium SanDiego Calif USA 2015

[20] M Zhang and H Yin ldquoAutomatic Generation of Security-Centric Descriptions for Android Appsrdquo inAndroid ApplicationSecurity Springer International Publishing 2016

[21] Z Wang C Li Z Yuan Y Guan and Y Xue ldquoDroidChain Anovel Android malware detection method based on behaviorchainsrdquo Pervasive andMobile Computing vol 32 pp 3ndash14 2016

[22] X Guozhi System science and Engineering Research Shanghaiscience and Technology Education Press 2001

[23] Y Zhou and X Jiang ldquoDissecting android malware char-acterization and evolutionrdquo in Proceedings of the 33rd IEEESymposium on Security and Privacy pp 95ndash109 IEEE 2012

[24] httpcopperdroidisgrhulacukcopperdroidindexphp[25] httpappscan360cn

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

12 Journal of Electrical and Computer Engineering

Table 9 Precision results for Appscan and intention reports

Judgment Appscan report Intention descriptiongeno Driod All geno Droid All

(1) Ineffective 5 0 5 1 2 3(2) Uncertain 1 65 66 1 4 5(3) Effective 4 0 4 8 59 67Total 10 65 75 10 65 75

fully covered in the 75 selected samples Inference systemmisses intention descriptions due to two major reasons(1) Artificial behavior facts extraction lacks accuracy Werely on our environment dependence graph and extractionalgorithm to perform simulation analysis However it is notprecise enough to handle the data flow propagation thatcannot be solved by existing static analysis technology (suchas exception handler code reflective calls) (2)The programlogic in real-world malware is more complex and withsemantic ambiguity and there are too many independentsensitive behaviors (in the form of user defined functions)This is not the result of the lack of theory and only technicalproblem

In fact static analysis in general lacks the capability ofextracting runtime behaviors and can be evaded accordinglyNevertheless we argue that our focus is to construct aframeworkwhich produces description and derivationmodelof malware intention

Readability and Effectiveness To evaluate the readability andeffectiveness of inference output (Intention) we perform auser study on our malware intention inference results Thegoal is twofold First we hope to knowwhether the generatedintentions are readable to average audience Second weexpect to see whether our intention descriptions can actuallyhelp users avoid risky apps

Methodology 360 microscopesrsquo Appscan platform [25] is anonline security scanning systemwhich focuses on the securityscan of Android apps The system will provide a scan reportafter the scanning and describe possible malicious intentof the app to users We uploaded all the samples to 360microscopesrsquo Appscan platform and got 75 security reportsof these samples We also collate ontology inference resultsand show them in the form of formalization and graphics75 samplesrsquo intention reports are produced We use both75 Appscan reports and 75 intention reports to measureuser reaction and the number of vulnerabilities providedby 360 microscopesrsquo Appscan platform is compared withthe intention description Furthermore due to the objectiveevaluation consideration we perform the user study basedon the official intents descriptions of 75 apps We use theseofficial descriptions as a reference standard

We have recruited participant directly from our lab-oratory group and we require participant that must besmartphone users We also make sure that participants haveexperiences with Android malware analysis and understandbasic smartphone events such as ldquointercept SMSrdquo or ldquoaccess

GPS locationrdquo We provide a rating for each sample report(Appscan and intention) with respect to its effectiveness Therating ranges from 1 to 3 where 1 means ineffective 2 meansuncertain and 3 means effective as shown in Table 9

Results and Implications Eventually the results are shownin Table 9 360 microscopesrsquo Appscan platform provides acoarser grained description and users can only judge the trueintents of 4 samples according to the Appscan report Whileusers can judge the malice of 67 samples using intentiondescription report This indicates our intention report isreadable even compared to Appscan report created by 360microscopesrsquo Appscan platform The results also reveal thatthe readability and granularity of intention description arerelatively important while Appscan-generated ones some-times confuse users 66 samples cannot be judged correctlythrough Appscan report In a further investigation we noticethat Appscan report provides a coarse intent descriptionof the sample while our security centered intention reportprovides a fairly fine-grained display of intention We believethat this can be further utilized during threat analysis

Threat Analysis Analysts are able to understand the relation-ships between malware behaviors (including trigger processof behaviors data transfer relation and the change in objectcharacteristics) according to the intention description Thisis of great practical significance for users to understand theimplementation mechanism of malware evaluate its possiblelosses and provide preventive solutions

6 Conclusion

In this paper a novel description and derivation model ofAndroid malware intention based on the theory of intentionand malware reverse engineering is proposed to restore theintention of malware and automate the process of intentionreasoning process In order to standardize the new concep-tual system and automate the intent derivation we createontology of malware intention As ontology is computablewe import SWRL rules represented in ontology languageand Fact Base to Jess engine and Jessrsquos outputs are thedescription of malware intention We evaluate our systemusing 75 typical malware samples (5 kinds) Experimentsshow that our inference system is capable of achieving thedesired results of intention description which proved itsfeasibility and effectiveness In addition by using existingreverse technology and data flow analysis tools we can extractbehavioral facts effectively

Journal of Electrical and Computer Engineering 13

Methods of mapping from Data Source to facts aredivided into two forms artificial extraction and automaticextraction We mainly use the methods of artificial factsextraction There are many opportunities for future workssuch as the implementation of automatic facts extractionmethods and the automatic visualization of the reasoningresults

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work was partly supported by Beijing Key Laboratory ofInternet Culture and Digital Dissemination Research Project(no ICDDXN001) National Key Technology Research andDevelopment Program of the Ministry of Science and Tech-nology of China (no 2015BAK12B03-03) Special Projectof Central Government to Guide the Development ofLocal Science and Technology (no Z171100004717002)and National Natural Science Foundation of China (nos61370065 61502040)

References

[1] P Guojun L Jingwen and S Runkang ldquoResearch and Progressof Android malware detectionrdquo Journal of Wuhan University(Science Edition) vol 61 no 1 pp 21ndash33 2015

[2] A Kharraz S Arshad C Mulliner et al ldquoA Large-ScaleAutomated Approach to Detectingrdquo in Proceedings of the 25thUSENIX Security Symposium (USENIX Security 16) pp 757ndash772 USENIX Association 2016

[3] M-Y Su and K-T Fung ldquoDetection of android malwareby static analysis on permissions and sensitive functionsrdquo inProceedings of the 8th International Conference on Ubiquitousand Future Networks ICUFN 2016 pp 873ndash875 Austria July2016

[4] S Arzt S Rasthofer C Fritz et al ldquoFlowDroid precise contextflow field object-sensitive and lifecycle-aware taint analysis forAndroid appsrdquoACM Sigplan Notices vol 49 no 6 pp 259ndash2692014

[5] E Bratman Michael ldquoWhat is intentionrdquo in Intentions incommunication P R Cohen Ed pp 15ndash32 theMITPress 1990

[6] J Shirley and D Evans ldquoThe user is not the enemy Fightingmalware by tracking user intentionsrdquo in Proceedings of the NewSecurity ParadigmsWorkshop 2008 NSPW rsquo08 pp 33ndash45 USASeptember 2008

[7] C-T Jung C-H Sun and M Yuan ldquoAn ontology-enabledframework for a geospatial problem-solving environmentrdquoComputers Environment and Urban Systems vol 38 no 1 pp45ndash57 2013

[8] I Horrocks F P Patel-Schneider H Boley et al SWRL ASemantic Web Rule Language Combining OWL and RuleMLW3C Member Submission 2004

[9] J Bak C Jedrzejek andM Falkowski ldquoUsage of the jess enginerules and ontology to query a relational databaserdquo LectureNotes in Computer Science (including subseries Lecture Notes

in Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 5858 pp 216ndash230 2009

[10] X-Y Du M Li and S Wang ldquoSurvey on ontology learningresearchrdquo Ruan Jian Xue BaoJournal of Software vol 17 no 9pp 1837ndash1847 2006

[11] S Cheng S Luo Z Li et al ldquoStatic Detection of DangerousBehaviors in Android Appsrdquo in Proceedings of the the 5thinternational symposium on cyberspace safety and securitySpringer International Publishing 2013

[12] IDA pro httpwwwhex-rayscomproductsida[13] J Jian and X Qing ldquoOn case auto generation using reverse

analysis for Android malwarerdquo Journal of HeFei University ofTechnology (Natural Science Edition) vol 39 no 4 pp 466ndash4702016

[14] W Yang X Xiao B Andow S Li T Xie and W EnckldquoAppContext differentiating malicious and benign mobile appbehaviors using contextrdquo in Proceedings of the 37th IEEEACMInternational Conference on Software Engineering (ICSE rsquo15) vol1 pp 303ndash313 IEEE May 2015

[15] Y Li T Shen X Sun et al ldquoDetection Classification andCharacterization of Android Malware Using API Data De-pendencyrdquo in Proceedings of the International Conference onSecurity and Privacy in Communication Systems vol 164 pp23ndash40 Springer International Publishing 2015

[16] M Zhang Y Duan H Yin and Z Zhao ldquoSemantics-awareAndroid malware classification using weighted contextual APIdependency graphsrdquo in Proceedings of the 21st ACM Conferenceon Computer and Communications Security (CCS rsquo14) pp 1105ndash1116 ACM Scottsdale Ariz USA November 2014

[17] P Wu H Changzhen and Y Shuping ldquoA dynamic intrusiveintention recognition method based on timed automatardquo Jour-nal of Computer Research and Development vol 48 no 7 pp1288ndash1297 2011

[18] J-H Han S-J Lee and J-H Kim ldquoBehavior Hierarchy-BasedAffordance Map for Recognition of Human Intention and ItsApplication to Human-Robot Interactionrdquo IEEE Transactionson Human-Machine Systems vol 46 no 5 pp 708ndash722 2016

[19] K Tam S J Khan A Fattori et al ldquoCopperDroid automaticreconstruction of android malware behaviorsrdquo in Proceedingsof the Network and Distributed System Security Symposium SanDiego Calif USA 2015

[20] M Zhang and H Yin ldquoAutomatic Generation of Security-Centric Descriptions for Android Appsrdquo inAndroid ApplicationSecurity Springer International Publishing 2016

[21] Z Wang C Li Z Yuan Y Guan and Y Xue ldquoDroidChain Anovel Android malware detection method based on behaviorchainsrdquo Pervasive andMobile Computing vol 32 pp 3ndash14 2016

[22] X Guozhi System science and Engineering Research Shanghaiscience and Technology Education Press 2001

[23] Y Zhou and X Jiang ldquoDissecting android malware char-acterization and evolutionrdquo in Proceedings of the 33rd IEEESymposium on Security and Privacy pp 95ndash109 IEEE 2012

[24] httpcopperdroidisgrhulacukcopperdroidindexphp[25] httpappscan360cn

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 13: Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

Journal of Electrical and Computer Engineering 13

Methods of mapping from Data Source to facts aredivided into two forms artificial extraction and automaticextraction We mainly use the methods of artificial factsextraction There are many opportunities for future workssuch as the implementation of automatic facts extractionmethods and the automatic visualization of the reasoningresults

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work was partly supported by Beijing Key Laboratory ofInternet Culture and Digital Dissemination Research Project(no ICDDXN001) National Key Technology Research andDevelopment Program of the Ministry of Science and Tech-nology of China (no 2015BAK12B03-03) Special Projectof Central Government to Guide the Development ofLocal Science and Technology (no Z171100004717002)and National Natural Science Foundation of China (nos61370065 61502040)

References

[1] P Guojun L Jingwen and S Runkang ldquoResearch and Progressof Android malware detectionrdquo Journal of Wuhan University(Science Edition) vol 61 no 1 pp 21ndash33 2015

[2] A Kharraz S Arshad C Mulliner et al ldquoA Large-ScaleAutomated Approach to Detectingrdquo in Proceedings of the 25thUSENIX Security Symposium (USENIX Security 16) pp 757ndash772 USENIX Association 2016

[3] M-Y Su and K-T Fung ldquoDetection of android malwareby static analysis on permissions and sensitive functionsrdquo inProceedings of the 8th International Conference on Ubiquitousand Future Networks ICUFN 2016 pp 873ndash875 Austria July2016

[4] S Arzt S Rasthofer C Fritz et al ldquoFlowDroid precise contextflow field object-sensitive and lifecycle-aware taint analysis forAndroid appsrdquoACM Sigplan Notices vol 49 no 6 pp 259ndash2692014

[5] E Bratman Michael ldquoWhat is intentionrdquo in Intentions incommunication P R Cohen Ed pp 15ndash32 theMITPress 1990

[6] J Shirley and D Evans ldquoThe user is not the enemy Fightingmalware by tracking user intentionsrdquo in Proceedings of the NewSecurity ParadigmsWorkshop 2008 NSPW rsquo08 pp 33ndash45 USASeptember 2008

[7] C-T Jung C-H Sun and M Yuan ldquoAn ontology-enabledframework for a geospatial problem-solving environmentrdquoComputers Environment and Urban Systems vol 38 no 1 pp45ndash57 2013

[8] I Horrocks F P Patel-Schneider H Boley et al SWRL ASemantic Web Rule Language Combining OWL and RuleMLW3C Member Submission 2004

[9] J Bak C Jedrzejek andM Falkowski ldquoUsage of the jess enginerules and ontology to query a relational databaserdquo LectureNotes in Computer Science (including subseries Lecture Notes

in Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 5858 pp 216ndash230 2009

[10] X-Y Du M Li and S Wang ldquoSurvey on ontology learningresearchrdquo Ruan Jian Xue BaoJournal of Software vol 17 no 9pp 1837ndash1847 2006

[11] S Cheng S Luo Z Li et al ldquoStatic Detection of DangerousBehaviors in Android Appsrdquo in Proceedings of the the 5thinternational symposium on cyberspace safety and securitySpringer International Publishing 2013

[12] IDA pro httpwwwhex-rayscomproductsida[13] J Jian and X Qing ldquoOn case auto generation using reverse

analysis for Android malwarerdquo Journal of HeFei University ofTechnology (Natural Science Edition) vol 39 no 4 pp 466ndash4702016

[14] W Yang X Xiao B Andow S Li T Xie and W EnckldquoAppContext differentiating malicious and benign mobile appbehaviors using contextrdquo in Proceedings of the 37th IEEEACMInternational Conference on Software Engineering (ICSE rsquo15) vol1 pp 303ndash313 IEEE May 2015

[15] Y Li T Shen X Sun et al ldquoDetection Classification andCharacterization of Android Malware Using API Data De-pendencyrdquo in Proceedings of the International Conference onSecurity and Privacy in Communication Systems vol 164 pp23ndash40 Springer International Publishing 2015

[16] M Zhang Y Duan H Yin and Z Zhao ldquoSemantics-awareAndroid malware classification using weighted contextual APIdependency graphsrdquo in Proceedings of the 21st ACM Conferenceon Computer and Communications Security (CCS rsquo14) pp 1105ndash1116 ACM Scottsdale Ariz USA November 2014

[17] P Wu H Changzhen and Y Shuping ldquoA dynamic intrusiveintention recognition method based on timed automatardquo Jour-nal of Computer Research and Development vol 48 no 7 pp1288ndash1297 2011

[18] J-H Han S-J Lee and J-H Kim ldquoBehavior Hierarchy-BasedAffordance Map for Recognition of Human Intention and ItsApplication to Human-Robot Interactionrdquo IEEE Transactionson Human-Machine Systems vol 46 no 5 pp 708ndash722 2016

[19] K Tam S J Khan A Fattori et al ldquoCopperDroid automaticreconstruction of android malware behaviorsrdquo in Proceedingsof the Network and Distributed System Security Symposium SanDiego Calif USA 2015

[20] M Zhang and H Yin ldquoAutomatic Generation of Security-Centric Descriptions for Android Appsrdquo inAndroid ApplicationSecurity Springer International Publishing 2016

[21] Z Wang C Li Z Yuan Y Guan and Y Xue ldquoDroidChain Anovel Android malware detection method based on behaviorchainsrdquo Pervasive andMobile Computing vol 32 pp 3ndash14 2016

[22] X Guozhi System science and Engineering Research Shanghaiscience and Technology Education Press 2001

[23] Y Zhou and X Jiang ldquoDissecting android malware char-acterization and evolutionrdquo in Proceedings of the 33rd IEEESymposium on Security and Privacy pp 95ndash109 IEEE 2012

[24] httpcopperdroidisgrhulacukcopperdroidindexphp[25] httpappscan360cn

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 14: Behavior Intention Derivation of Android Malware Using ...downloads.hindawi.com/journals/jece/2018/9250297.pdf · ResearchArticle Behavior Intention Derivation of Android Malware

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom