17
Categorization of malicious behaviors using ontology-based cognitive agents Umar Manzoor a, , Samia Nefti a , Yacine Rezgui b a The University of Salford, Department of Computer Science, Salford, UK b Cardiff University, Cardiff, UK article info abstract Article history: Received 20 December 2010 Received in revised form 6 July 2011 Accepted 25 June 2012 Available online 9 July 2012 Every organization uses computer networks (consisting of networks of networks) for resource sharing (i.e. printer, files, etc.) and communication. Computer networks today are increasingly complex, and managing such networks requires specialized expertise. Monitoring systems help network administrators in monitoring and protecting their network by not allowing users to run illegal application or changing the configuration of network nodes. In this paper we have developed an agent based system for activity monitoring on networks (ABSAMN) and proposed Categorization of Malicious Behaviors using Cognitive Agents (CMBCA). This uses ontology to predict unknown illegal applications based on known illegal application behaviors. CMBCA is an intelligent multi agent system used to detect known and unknown malicious activities carried out users over the network. We have compared An Agent Based System for Activity Monitoring on Network (ABSAMN) and Categorization of Malicious Behaviors using Cognitive Agents (CMBCA) concurrently at the university campus having seven labs equipped with 20 to 300 PCs in various labs. Both systems were tested on the same configuration; results indicate that CMBCA outperforms ABSAMN in every aspect. © 2012 Elsevier B.V. All rights reserved. Keywords: Network monitoring Malicious activity Ontology Cognitive mobile agent Distributed proxy server Collaborative multi-agent system 1. Introduction Computer systems have introduced new ways of storing and managing data and as such play an important role in the current modern world. Today every organization uses computers to manage their information and hardware resources. This can only be possible when two or more computers are connected using some sort of network [3]. Computer networks today are increasingly complex. They may involve sub-networks and as such require specialized expertise, i.e. network administrator, for their management and maintenance [9]. The job of network administrator includes network monitoring (i.e. monitoring of malicious application on network nodes), network management (i.e. installation/un-installation applications on network nodes) and resource management (adding/removing resources on network). With the increased complexity of contemporary computer networks, many companies are trying to develop automated tool(s) for network administration. Monitoring systems [14] are one category among such products. They assist network administrators in monitoring and protecting the network by not allowing users to run illegal application or changing network nodes configuration. Monitoring systems are playing an important role in universities, colleges, schools, offices, etc., helping network administrators protect and secure their networks by monitoring user and resources activities. Monitoring systems are also used as a means to help parents monitor and restrict their children activities over the Internet [15,16,31]. Parent Control tools [3236] secretly record user's computer (PC usage) and internet activities (chats, send/receive emails, websites visited, keystrokes typed, capture screenshots, etc.), block unwanted websites, and thus protect users from malicious content. Parental Control tools available in the market a) require client side installation which in turn need manual maintenance, b) can easily be detected using tools available on the Internet [39,40], c) use static knowledge bases for detection (manual update required), d) are developed to monitor single nodes (PC), e) are unable to detect unknown malicious websites, f) Data & Knowledge Engineering 85 (2013) 4056 Corresponding author. E-mail addresses: [email protected] (U. Manzoor), [email protected] (S. Nefti), [email protected] (Y. Rezgui). 0169-023X/$ see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.datak.2012.06.006 Contents lists available at SciVerse ScienceDirect Data & Knowledge Engineering journal homepage: www.elsevier.com/locate/datak

Categorization of malicious behaviors using ontology-based cognitive agents

  • Upload
    yacine

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Categorization of malicious behaviors using ontology-based cognitive agents

Categorization of malicious behaviors using ontology-basedcognitive agents

Umar Manzoor a,⁎, Samia Nefti a, Yacine Rezgui b

a The University of Salford, Department of Computer Science, Salford, UKb Cardiff University, Cardiff, UK

a r t i c l e i n f o a b s t r a c t

Article history:Received 20 December 2010Received in revised form 6 July 2011Accepted 25 June 2012Available online 9 July 2012

Every organization uses computer networks (consisting of networks of networks) for resourcesharing (i.e. printer, files, etc.) and communication. Computer networks today are increasinglycomplex, and managing such networks requires specialized expertise. Monitoring systemshelp network administrators in monitoring and protecting their network by not allowing usersto run illegal application or changing the configuration of network nodes. In this paper wehave developed an agent based system for activity monitoring on networks (ABSAMN) andproposed Categorization of Malicious Behaviors using Cognitive Agents (CMBCA). This usesontology to predict unknown illegal applications based on known illegal application behaviors.CMBCA is an intelligent multi agent system used to detect known and unknown maliciousactivities carried out users over the network. We have compared An Agent Based System forActivity Monitoring on Network (ABSAMN) and Categorization of Malicious Behaviors usingCognitive Agents (CMBCA) concurrently at the university campus having seven labs equippedwith 20 to 300 PCs in various labs. Both systems were tested on the same configuration; resultsindicate that CMBCA outperforms ABSAMN in every aspect.

© 2012 Elsevier B.V. All rights reserved.

Keywords:Network monitoringMalicious activityOntologyCognitive mobile agentDistributed proxy serverCollaborative multi-agent system

1. Introduction

Computer systems have introduced new ways of storing and managing data and as such play an important role in the currentmodern world. Today every organization uses computers to manage their information and hardware resources. This can only bepossible when two or more computers are connected using some sort of network [3]. Computer networks today are increasinglycomplex. Theymay involve sub-networks and as such require specialized expertise, i.e. network administrator, for theirmanagementand maintenance [9]. The job of network administrator includes network monitoring (i.e. monitoring of malicious application onnetwork nodes), network management (i.e. installation/un-installation applications on network nodes) and resource management(adding/removing resources on network).

With the increased complexity of contemporary computer networks, many companies are trying to develop automated tool(s) fornetwork administration. Monitoring systems [14] are one category among such products. They assist network administrators inmonitoring and protecting the network by not allowing users to run illegal application or changing network nodes configuration.Monitoring systems are playing an important role in universities, colleges, schools, offices, etc., helping network administratorsprotect and secure their networks by monitoring user and resources activities.

Monitoring systems are also used as a means to help parents monitor and restrict their children activities over the Internet[15,16,31]. Parent Control tools [32–36] secretly record user's computer (PC usage) and internet activities (chats, send/receiveemails, websites visited, keystrokes typed, capture screenshots, etc.), block unwanted websites, and thus protect users frommalicious content. Parental Control tools available in the market a) require client side installation which in turn need manualmaintenance, b) can easily be detected using tools available on the Internet [39,40], c) use static knowledge bases for detection(manual update required), d) are developed to monitor single nodes (PC), e) are unable to detect unknownmalicious websites, f)

Data & Knowledge Engineering 85 (2013) 40–56

⁎ Corresponding author.E-mail addresses: [email protected] (U. Manzoor), [email protected] (S. Nefti), [email protected] (Y. Rezgui).

0169-023X/$ – see front matter © 2012 Elsevier B.V. All rights reserved.doi:10.1016/j.datak.2012.06.006

Contents lists available at SciVerse ScienceDirect

Data & Knowledge Engineering

j ourna l homepage: www.e lsev ie r .com/ locate /datak

Page 2: Categorization of malicious behaviors using ontology-based cognitive agents

usually include password authentication to prevent unauthorized access (if the password is compromised unauthorized usercan disable it), g) and are thus not particularly smart.

The network administrator usually deploys a proxy server on the network tomonitor and restrict activities on the network nodes.A Proxy Server is a software program deployed on a server that acts as a mediator between a network node user and the Internet. AllInternet requests from network users are forwarded to the proxy server which evaluates the request according to its filtering rules. Ifthe request is validated, the requestwill be forwarded to the appropriate server and the response from serverwill be forwarded to theuser. If the request is violating the rules, the request will be blocked and an error message will be shown to the user. Proxy serversexhibit the following limitations: a) It can easily be by-passed using tools available on the Internet [17,42], b) Filtering rules(keywords/URL, etc.) are defined by the network administrator and require manual updating (i.e. static rules), c) all the traffic isdiverted to/fromproxy servers and become bottleneck (i.e. centralized approach), d) If a proxy server crashes, Internet servicewill beunavailable across the whole network, e) proxy server are unable to detect correlated websites (i.e. if youtube is not permitted, aproxy server will not be able to block other video websites except youtube).

There is a need to develop a tool which a) is distributed and uses a decentralized approach, b) does not require softwareinstallation on network node(s), c) monitors activities on network nodes (i.e. client machine), d) is so light and transparent that anetwork user does not know that his/her activities are being monitored, e) uses application behavior to detect malicious ornon-malicious activities, f) has the capability of taking action (i.e. killing the malicious activity, disabling a user account, etc.) atruntime, g) is intelligent (dynamically updates rules without intervention/interaction of the network administrator).

Umar et al. in [1] proposed “An Agent Based System for Activity Monitoring on Network – ABSAMN” for monitoring of resourcesover a network, suitable for network of networks; commonly known as Campus Area Network. ABSAMN is a multi agent basedsystem for the monitoring of illegal activities or applications running on the network nodes. An Agent is a software program whichacts on the behalf of the user or other agent to perform a specific task in order to achieve its goal(s) [4,11]. Agents are of many typesand one of them is Mobile which the capability of moving from one node to another of the network autonomously [2,6,10].

When two or more Agents collaborate and communicate their actions with each other in order to achieve a common goal such asystem is called Multi Agent Systems (MAS) [12,18]. The characteristics (distributed nature, flexibility) of MAS make it a perfectchoice to implement network based applications [13,19]. Because of tremendous capabilities (i.e. flexibility, self recovering,fault tolerant, decentralized, etc.) of Multi Agent Systems, they have been used in many areas such as semantic web services,network management, spam filtering, e-commerce, manufacturing, file transfer, decision making, and business applications[21,24,23,20,26,25,22,27].

ABSAMN, once initialized with the associated knowledge (i.e. rules), (a) monitors the resources autonomously, b) does notrequire any user interaction or intervention, c) does not require any software installation on the network node to monitoractivities, d) uses mobile agents to monitor network nodes and take action against malicious (illegal) activity. These propertiesdistinguish ABSAMN from other monitoring tools available in the market. However, ABSAMN has the following limitations:

➢ Rule set formalicious (illegal) applications are defined in the formof “process name” and “action” pair. The network administratoris responsible to define and update the rule set.○ bRule>

bPROCESS_NAME>Msnb/PROCESS_NAME>bACTION>KILLPROCb/ACTION>

b/Rule>○ Sample Rule Set

➢ Only process name is used to identify the malicious activity. If a user changes the name of the illegal application, ABSAMNwillnever be able to monitor it.

➢ The rule set is static and maintained manually.➢ The proposed framework cannot predict or capture malicious unknown applications.

In this paper we have modified the existing ABSAMN architecture and proposed CMBCA (Categorization of MaliciousBehaviors using Cognitive Agents). The latter uses ontology based knowledge to predict unknownmalicious applications based onknown application behaviors. CMBCA is fully autonomous and once initialized monitors whole networks with the help ofcognitive mobile agents. CMBCA can dynamically update its Knowledge Base (KB) by adding new concepts and their relationshipsuncovered by detecting unknown illegal applications.

The remainder of this paper is organized as follows. Section 2 provides a brief overview of the ontology and CMBCA knowledgebase. Section 3 introduces the CMBCA architecture, including the pre-processing, disambiguation-extrapolation-detection, andclassification using ontology. In Section 4, we critically evaluate and compare the performance of CMBCA with ABSAMN. Finally,concluding remarks are drawn in Section 5.

2. Ontology, agents and knowledge base

Ontology is an explicit specification of an abstract model consisting of facts, used to construct the domain model [5,8]. InComputer or Information Sciences, Ontology is a formal representation of a specific domain, containing concepts, theirrelationships and often represented in a semantic network. The semantic network is internally represented in graph where nodesare concepts and links among nodes are relationships. There are two main types of ontology: a) upper/top level ontology which

41U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 3: Categorization of malicious behaviors using ontology-based cognitive agents

describe very general concepts and usually used to define common objects that are generally applicable across a wide range ofdomain ontologies, and b) domain ontology describes very specific concepts of a selected domain and usually these ontologies areincompatible with each other as each concept can have different meaning in different domain ontology [7].

Ontology building requires two steps a) extracting knowledge from the domain using knowledge extraction techniques andb) representing the knowledge in a meaningful way. Ontology can be used to define the domain or do reasoning on the domain.Ontology plays an important role in the development of intelligent distributed systems because it gives meaning and context todata and can be dynamically updated (i.e. new concepts or relationship can be added) [8]. Ontology is used in many applicationareas such as e-commence (e-bay, Amazon), search engines (Yahoo), andWorld Wide Web (Semantic Web). Ontology plays animportant role in refining or constructing the underpinning knowledge base of information systems [5].

CMBCA uses an ontology-based knowledge base to classify unknown applications as malicious (illegal) or legal. CMBCAknowledge base is developed in a hierarchal way and contains the following classes.

➢ Application➢ Application Type➢ Resource

Malicious activity types (i.e. chatting application, illegal internet application, etc.) are defined in the application type. The‘Resource class’ contains sub-classes which define all the resources that can be used by an application. Any unknown application isclassed asmalicious application if it uses any resources defined in the ‘Resource Class’. An unknown application can be categorized inany of the ‘Application Type’ if it fulfills few properties defined against the application type. Camel notation is a common mean todescribe properties, and usually characteristics of the particular class are defined using ‘has a’ or ‘is a’ property. CMBCA knowledgebase is implemented in protégé 3.3.1 [30], partial ontology and properties are shown in Fig. 1a) and b) respectively. The CognitiveOntology Agent is responsible to infer the unknown application into a defined ‘Application Type’ and update the ontology whenevernew unknown illegal application is captured.

3. System architecture

CMBCA is an Intelligent Multi Agent System used to detect known and unknownmalicious activities carried out by users over anetwork. It supports profile based monitoring i.e. one application can be malicious for one part of the network and can be normalfor other parts of the network. The network administrator can create one profile for the whole network or can create differentprofiles for parts of the network. CMBCA is fully autonomous and once initialized, monitors whole networks with the help ofcognitive mobile agents. The architecture of the system consists of the following agents as shown in Figs. 2 and 4.

• Knowledge Elicitation Agent (KEA)• Sub-Network Agent (SNA)• Cognitive Ontology Agent (OA)• Action Agent

3.1. Knowledge Elicitation Agent (KEA)

Knowledge Elicitation Agent (KEA) is the core agent as it manages and initializes the system autonomously without userinteraction. In the initialization, KEA performs the following steps:

1) KEA loads the network configuration from a pre-configured XML file. Network XML contains the information about thesub-networks of the network and which activities are allowed on these sub-networks. Sample network file is shown in Fig. 3.

• IP tag contains the range of the IP Addresses within the sub-network.• Activities tag contains information about the activities (applications) which are not allowed (malicious) on the sub-network.The network administrator is responsible for the configuration of the activities for each network. Partial configuration isgiven below:

○ The user is allowed to change the system configuration of the network node or not.○ The user is allowed to play games on the network node or not.○ The user is allowed to use chatting application on the network nodes.○ The user is allowed to visit multimedia (video / audio) or adult websites.

2) KEA loads the ontology from the ontology-based knowledge base.3) KEA makes different Rule Profiles for the sub-networks.4) KEA creates and initializes N number of Sub-Network Agents (SNA) where N depends on the number of sub-networks. KEA

uses Sub-Network Agent to Sub-Network ratio of 1:1. After initialization each Sub-Network Agent moves to a destinationserver to monitor the activities of the assigned network.

5) KEA passes ontology, rule profile and configuration of the sub-network as arguments to SNA.6) Once SNA reaches its destination server, KEA transfers the compiled code of cognitive ontology agent to sub-network agent.

42 U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 4: Categorization of malicious behaviors using ontology-based cognitive agents

Fig.

1.(a)Pa

rtialh

ierarcha

ltree.

(b)Pa

rtialp

rope

rties.

43U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 5: Categorization of malicious behaviors using ontology-based cognitive agents

After initialization, KEA waits for the updates from the Sub-Network Agent and in case of any violation reported, it update itsdatabase with violation details. KEA keeps track of the Sub-Network Agents by constantly monitoring the activities of theseagents.

Knowledge Elicitation Agent

KEAAgent()

{

SNb−LoadNetworkConfiguration();

Ob−LoadOntologyfromKB();

RNb−MakeProfilesforSubNetwork(SN);

foreach Ni in SN{

AgentIDb−CreateSNA(RNi,SNi,O);

AgentIDi.Destinationb−Ni.Destination

TransferCompiledCode(AgentID);

}

while(1)

{

WaitforUpdates();

MonitorSNAStatus();

}

}

3.2. Sub-Network Agent (SNA)

After creation and initialization, Sub-Network Agents (SNA) move to the assigned destination and perform the followingsteps:

1) SNA loads ontology, network and rule profile configuration passed by the KEA.2) SNA creates and initializes N number of Cognitive Ontology Agents (COA) where N depends on the number of network nodes

present in the sub-network. SNA uses Cognitive Ontology Agent to Network nodes ratio of 1:5 (i.e. if there are 40 PCs in thesub-network, 8 Cognitive Ontology Agents will be created and each one will be assigned 5 PCs for monitoring). Afterinitialization, each Cognitive Ontology Agent moves to the nodes in the itinerary, one by one, and reports back to the SNA.

3) SNA passes the Ontology, Rule Profile of the sub-network, and list of network nodes to monitor as arguments to COA.

Fig. 2. Zero level architecture of categorization of malicious behaviors using cognitive agents.

44 U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 6: Categorization of malicious behaviors using ontology-based cognitive agents

4) if COA reports any malicious activity over the network:a. SNA creates Action Agent (AA) and initializes AA with network node name, Process ID, IP Address, and action to be

performed on that node.b. Waits for AA response.c. Updates its data base and sends record to KEA.

After creation and initialization of COAs, SNA waits for the updates from Cognitive Ontology Agents and in case of any violationreported, it updates its database with violation details and also sends the violation details to KEA. SNA keeps track of the CognitiveOntology and Action Agents by constantly monitoring the activities of these agents.

Sub-Network Agent

SNAAgent()

{

Agent2Nodeb−5

SNb−LoadLocalNetworkDetails();

RNb−LoadRules();

Ob−LoadOntology();

TotalNodesb− SN.TotalNodes

INb−MakeItinerary(TotalNodes,Agent2Node);

i=0

while(TotalNodes/Agent2Node>0)

{

AgentIDb−CreateCOA(RN,SN,Ii,O);

i++

}

Sample Network XML File<Networks><Network1>

<IP><From>172.168.4.10</From><To>172.168.4.90</To>

</IP><ACTIVITIES>

<CONFIGURATION_CHANGES><Status>Malicious</Status><Action>ROLLBACK</Action>

</CONFIGURATION_CHANGES><GAMING>

<Status>Malicious</Status><Action>KILL</Action>

</GAMING><CHAT_APPLICATION>

<Status>Malicious</Status><Action>LOCKACCOUNT</Action>

</CHAT_APPLICATION><INTERNET>

<MultiMedia_WebSites><Status>Malicious</Status><Action>ACCESS DENIED</Action>

</MultiMedia_WebSites><ADULT_Websites>

<Status>Malicious</Status><Action>EXTREMEVIOLATION</Action>

</ADULT_Websites>......</INTERNET>

...

...<ACTIVITIES>

</Network1><Network2>

<IP><From>172.168.3.20</From><To>172.168.3.70</To>

</IP><CONFIGURATION_CHANGES>

<Status>Allowed</Status><Action>LOG CHANGES</Action>

</CONFIGURATION_CHANGES>......</INTERNET>

</Network2>......

</Networks>

Fig. 3. Sample Network XML file.

45U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 7: Categorization of malicious behaviors using ontology-based cognitive agents

while(1)

{

WaitforUpdates();

if(Macilious Activity)

{

AgentIDb−CreateAA(NodeName,ProcessID,

IPAddress,Action);

WaitforResponse();

UpdateDatabase();

}

MonitorCOA-AA-Status();

}

}

Fig. 4. System architecture of categorization of malicious behaviors using cognitive agents.

46 U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 8: Categorization of malicious behaviors using ontology-based cognitive agents

3.3. Cognitive Ontology Agent (COA)

Cognitive Ontology Agent (COA) plays a very important role in CMBCA as it classifies applications running on the networknode as malicious or un-malicious as shown in Fig. 7. After initialization each Cognitive Ontology Agent moves to the nodes in theitinerary one by one and extracts all processes running on the network node. Cognitive Ontology Agent (COA) is responsible toextract the core information of every process and stores it in a process vector set model V.

V ¼ V1;V2;V3;V4;…;Vmf g ð1Þ

The information includes name, size, path, company, built-in strings, ports (TCP, UDP, Remote-Address), keystrokes frequency,mouseclicks frequency, DLLs, process screen resolution, network connection, packet rate, packet contents, files opened, file extensions, etc.

A process can have few or all attributes mentioned above and the information is used by COA to classify the application. First, itis acknowledged that the core information of the process contains much useless information which needs to be removed toreduce the size of the process vector set V which in turn will reduce the computational time. Cognitive Ontology Agentclassification algorithm is similar to the approach proposed by Nefti in [34] but the main difference is in the use of WordNet [41]as a knowledge base for text analysis. WordNet is a lexical database for English language developed at Princeton University whereeach word is linked to a set of senses and each sense identifies one particular meaning of the word [45]. WordNet has been widelyused in Text Analysis (i.e. word sense disambiguation) [44] and research results demonstrate that WordNet significantlyimproves the accuracy of analysis as compared to other approaches [37,38,43]. The classification algorithm is comprised of threesteps: (a) Pre-processing, (b) Concept Disambiguation, Extrapolation and Detection, (c) Classification.

3.3.1. Pre-processingTo generate the reduced process vector set P of each process, preprocessing will be performed on different attributes of the

process vector set V.

■ Built-in strings attribute Vi of process vector set V contains much useless strings, which needs to be removed. Fig. 5 shows thepartial built-in strings of googletalk [29], the snapshot is taken using Windows SysInternals Process Explorer [28].

Vi ¼ W1;W2;W3;W4;…;Wmf g ð2Þ

In Step 1, all standard stop-list / stemmerwords like (“is”, “the”, “on”, “and”, “in”, “with”, “for”, “by”…) andwords including any non-alphabetic char(s) like (“%”, “_”, “#”, “@”, “-”,…) are eliminated from built-in string vector attribute. In Step 2, homogeneous words like{(“chat”, “chatting”, “chatted”), (“connected”, “connecting”, “connection”)} are all substituted by the single word “chat” and “connect”respectively. In Step 3, the number of occurrences for each keyword is computed andmultiple entries are eliminated frombuilt-in stringattribute. This approach significantly reduces the size of the built-in string attribute Vi with little information of use is lost.■ Keystroke frequency attribute Vj of process vector set V contains keystroke frequencies of all the keys, however we are

interested in abnormal keystroke frequency. The keys with α (normal) key frequencies are removed from Vj where α isadjusted accordingly when an unknown illegal application is captured. Fig. 6 shows the keystroke frequency of differentprocesses, the peaks in the graph shows abnormal keystroke frequency. Similarly, mouse click frequency attribute Vk isreduced using the same technique.

Vj ¼ K1;K2;K3;K4;…;Knf g ð3Þ

■ Dynamic Link Library (DLL) attribute Vl of process vector set V contains all DLLs used by the process, howeverwe are interested onlyin Operating SystemDLLs so all DLLs except Operating SystemDLLswill be removedwhich significantly reduces the size of Vl.

Vl ¼ D1;D2;D3;D4;…;Dnf g ð4Þ

3.3.2. Concept disambiguation, extrapolation and detectionGiven the filtered process vector set P extracted from the previous step, built-in string and DLL attributes need to be

disambiguated in order to assign proper category to each of these attributes.

P ¼ V1;V2;V3;V4;…;Vmf g ð5Þ

Built-in string attribute Vi={K1, K2, K3, K4, …, Km} contains n keywords where each keyword Ki (i=1,…, m) has n possiblemeanings and KiSq (q=1,…, n) represents the different meanings that Ki can express. In order to disambiguate and to assign theproper sense to each keyword, we compare all of the keyword senses by taking groups of two keywords at a time. For each of thewords' senses, we select the most related senses that are semantically more related using the semantic relatedness between twowords. The most related senses between two keywords are calculated by considering the number of nodes present in the pathsthat connect each of the two keywords' senses in a given taxonomy (WordNet), and selecting the two words' senses yielding theshortest path (smallest number of nodes) in the hierarchy.

Ws x; yð Þ ¼ MinN Wsi x; yð Þ½ �: 0 ≤ i ≥ pf g ð6Þ

47U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 9: Categorization of malicious behaviors using ontology-based cognitive agents

After identifying the most related senses for each pair of keywords, Concept set (CS) is generated which contains either theLowest Super Ordinate (LSO) for each pair of keywords or the individual concepts of keywords x and y. The selection depends onthe parameter hwhich is used to control the level of generalization [34], if the LSO of each keyword (x, y) lies within or equal to h,LSO is added to the concept set else keywords x and y individual concepts are added to the set.

CSi ¼ Opt LSO x; yð Þjlsoi x; yð Þ : 1 ≤ i ≥ hCk xð Þ∧Ck yð Þ

� �ð7Þ

Fig. 6. Partial graph showing keystroke frequency of different processes captured using CMBCA.

Fig. 5. Partial snapshot of googletalk.exe built-in strings.

48 U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 10: Categorization of malicious behaviors using ontology-based cognitive agents

Malicious keyword concepts are loaded and stored in malicious set (Mk). Each concept in Concept set is looked up in WordNetand a list of synonym, hypernym synsets for each concept are stored in Synonym (Sc) and Hypernym (Hc) sets respectively. Eachconcept, its corresponding Sc and Hc are compared with the malicious concepts one by one and matches (if any) are stored in aseparate resultant set (RS) with labels C (Original Concept CS), S (Concepts' Synonym) or H (Concepts' Hypernym). Each conceptin RS is assigned a score based on the assigned label weight multiply by the number of occurrences in the text; C is assignedmaximum weight equal to 1, S is assigned a weight of 0.75 and H is assigned 0.50 respectively. The collective weight of themalicious content (MC) found in string attribute is defined as the sum of the individual scores of all the concepts in RS.

MC ¼ ∑iWeight RSið Þ � Occurrence RSið Þ ð8Þ

where Weight(Rsi) represents the individual weight of concept and Occurrence(Rsi) represents the frequency of appearance intext. If conditionMC≥β(whereβis configurable and updated when new application text is assigned as malicious) is satisfied, thena) string attribute is assigned a malicious tag and b) new concepts (i.e. Synonyms and Hypernyms) are added to the knowledgebase.

In order to disambiguate and assign proper category to DLL attribute, n Category Sets (Cn) are created where n represents thenumber of application categories in the ontology. Each DLL is compared with all category DLLs. If it coincides with one or morecategories, the DLL is added to each of these category sets. Once all the DLLs are assigned, we calculate the score of each CategorySet. The collective score of a Category Set (SCi) is defined as the sum of the individual scores of all the DLLs in Ci

SCi¼ ∑

iScore DCið Þ ð9Þ

Where Score(DCi) represents the individual score of each DLL which currently is uniform (i.e. 1) for all DLLs. DLL attribute ofthe process vector set is assigned a category which has the maximum score. Similarly, keystroke and mouse-click attributes ofprocess vector set are assigned abnormal or normal category based on the key frequencies.

3.3.3. ClassificationCognitive Ontology Agent (COA) uses the filtered and disambiguated process vector setW extracted from the previous step, to

classify the process as malicious or un-malicious using the ontology.

W ¼ V1;V2;V3;V4;…;Vmf g ð9Þ

Fig. 7. Cognitive ontology agent working.

49U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 11: Categorization of malicious behaviors using ontology-based cognitive agents

In order to assign malicious class to W, OA substitutes all attributes into ontology concepts and these concepts can belong toone or many ontology classes. In order to assign proper class to W, we assign each concept a weight. Once all the attributesare substituted by concepts, we calculate the weight of each ontology class Oi by summing up all the individual weights ofconcepts in Oi

WOi¼ ∑

iWeight Oið Þ ð10Þ

where Weight(Oi) represents individual weight of ontology class concepts which currently is either 0 or 1. Ontology class whichhas the maximum weight will be assigned to W only if the maximum weight ≥2 else ‘Unknown class’ will be assigned.

Cognitive Ontology Agent

OAAgent()

{

foreach(I in Iternary)

{

MoveTo(I.Destination);

PIb−GetProcessList();

foreach(P in PI)

{

Vb−GetResourcesList(P);Pb−Preprocessing(V);Wb−Disambiguate(P);

Rb−ClassifyUsingOntology(P);

if(R.Status Equals Malicious)

Report Violation to SNA

}

}

}

3.4. Action Agent (AA)

After initialization each Action Agent moves to the network node assigned and performs the following tasks:

1) Get the handler of the Malicious Process Pi (passed in argument by SNA).2) Perform the Action on Pi and report the result of action to SNA.3) Wait for the response from SNA.4) If SNA assigns a new task move to new destination and repeat step 1 else Kill itself.

Action Agent

AAAgent()

{

Dib−Destination;

Pib−MaliciousProcess;

Aib−Action2Perform;

MoveTo(Di);

Hb−GetProcessHandle(Pi);

PerformAction(Pi,Di,Ai);

SendAck(CA);

Resultb−WaitforSNA();

if(Result.Kill)

KILL();

else

{

MoveTo(Result.Destination);

Repeat Same Steps

}

}

50 U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 12: Categorization of malicious behaviors using ontology-based cognitive agents

4. Performance analysis

We have compared ABSAMN (An Agent Based System for Activity Monitoring on Network) and CMBCA (Categorization ofMalicious Behaviors using Cognitive Agents) concurrently at the university campus having seven labs equipped with 20 to 300PCs. ABSAMN and CMBCA were compared on a same configuration; results show that CMBCA outperforms ABSAMN in everyaspect, some of the comparison is presented in this section. Table 1 shows the comparison of ABSAMN and CMBCAwith respect todifferent operations. CMBCA different operations take little more time then ABSAMN as it has to perform extra tasks (i.e. load/retrieve the ontology), however the time is ignorable as the difference between the scan time of one machine is only 3.38 secondsand the accuracy of CMBCA is far more better than ABSAMN.

Fig. 8 shows the activity chart comparison of ABSAMN and CMBCA with reference to three labs each containing 20 PCs, whenboth systems start up MCA and KEA are created respectively. These agents are responsible to initialize and manage their system

Table 1Comparison of ABSAMN and CMBCA with respect to average time taken by different operations.

Operation description Time Operation description Time

ABSAMN CMBCAMCA Initialization 2.43 KEA Initialization 3.96CA Creation & Initialization 2.96 SNA Creation & Initialization 4.07Monitor Agent Creation & Initialization 2.17 COA Creation & Initialization 3.21Monitor Agent Match Rule Activity 5.41 COA Match Rule Activity 8.79Take an Action against a rule violation on thenetwork using Action Agent

3.57 Take an Action against a rule violation on thenetwork using Action Agent

3.57

Send a Message on the Network using Messaging Agent 1.03 Send a Message on the Network using Messaging Agent 1.03Get Information of a System on the Network Using Information Agent 4.52 Get Information of a System on the Network Using Information Agent 4.89Agents Movement on the Network 0.5 Agents Movement on the Network 0.89

Fig. 8. Activity chart comparison for ABSAMN and CMBCA with reference to three labs each containing 20 PCs.

51U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 13: Categorization of malicious behaviors using ontology-based cognitive agents

autonomously. In the initialization, both MCA and KEA load the network configuration from the pre-configuration XML file, inaddition KEA loads ontology from the knowledge base and the initialization takes 2.43 and 3.96 seconds respectively. Afterinitialization, both MCA and KEA, create and initialize n agents for the sub-servers, where n depends on the number ofsub-servers. This activity takes 2.96 and 4.07 seconds respectively.

In both systems, n sub-server agents (i.e. CA and SNA) are created and initialized in parallel. Fig. 8 shows the creationand initialization of three sub-server agents (i.e. CA and SNA) in parallel as we have 3 sub servers in the network. Each ofthese sub-server agents will create and initialize n agents (i.e. MA and COA) in parallel for sub-network monitoring wheren depends on the configuration. This activity takes 2.17 and 3.21 seconds respectively. In ABSAMN, one Monitor Agent(MA) is responsible to monitor 8 PCs so a total of nine monitor agents (3 in each lab) will be created where as in CMBCA,one (Sub-Network Agent) SNA is responsible to monitor 5 PCs so a total of twelve Sub-Network agents (4 in each lab) willbe created.

After creation, these agents (MA and COA) move to the first node in the itinerary and start monitoring. Monitor Agent takes5.41 seconds to complete the scan of one machine; hence for 8 PCs it will take 43.28 to complete one scan of monitoring.Cognitive Ontology Agent takes 8.79 seconds to complete the scan of one machine so for 5 PCs it will take 43.95 to complete onescan of monitoring. Fig. 9 shows the comparison of malicious activities captured using ABSAMN and CMBCA on the sameconfiguration, CMBCA captures more violations because it detects malicious unknown application based on the applicationbehavior and does not rely on the process name. ABSAMN has no intelligence (i.e. rules are not updated dynamically, networkadministrator can add new rules or update existing rules which itself requires manual maintenance).

CMBCA learns from the environment and updates its ontology-based knowledge base by adding, updating or deletingconcepts dynamically which enhances the efficiency of the system. The administrator can view the number of violations on anyspecific day, week or month or any specific date as shown in Fig. 9. Most importantly, the administrator has the option to view thenumber of violations per lab, using this graph the administrator can easily track from which lab most violations are beingperformed.

Fig. 10a) and b) show the number of violations per user and the IP address respectively, captured using ABSAMN and CMBCA.Results show that CMBCA captures more malicious activities as compared to ABSAMN. It is worth noting that the number ofviolations per user or IP address captured by ABSAMN is inaccurate as some users have violated more times than others but it isunable to capture these violations, which leads to wrong statistics.

CMBCA's statistics helps the network administrator to pin point individual machine / user having high rate of violations byusing these graphs and take action accordingly (i.e. he can disable user account, etc.). If the user tries to change the systemconfiguration, CMBCA will detect the change and report it to the network administrator with user id, machine name, IPaddress, lab name and the system configuration information. Fig. 11 shows the number of agents required by ABSAMN andCMBCA for monitoring seven labs having 20 to 300 PCs. In ABSAMN, one agent is responsible to monitor 8 network nodeswhere as in CMBCA one agent is responsible to monitor 5 network nodes. Initially there is one agent (i.e. MCA and KEA) inboth systems till time 2.43 and 3.96 respectively, after initialization both of these agents will create / initiate sub-networkagents (i.e. CA and SNA) and agent count increases to 8 at 5.39 and 8.03 seconds respectively. Each of these sub-networkagents (i.e. CA and SNA) will create agents for monitoring (i.e. MA and COA) and after a delay of 7.56 and 11.24 seconds fromstartup, agent count increase by 52 and 78 respectively. As there were already 8 agents in both systems, a total of 44 and 70

Fig. 9. Comparison of number of violations per day captured using ABSAMN and CMBCA.

52 U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 14: Categorization of malicious behaviors using ontology-based cognitive agents

Fig. 10. (a). Number of violations per users captured using ABSAMN and CMBCA. (b). Number of violations per IP Address captured using ABSAMN and CMBCA.

Fig. 11. Comparison of ABSAMN and CMBCA Agent Population over Network.

53U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 15: Categorization of malicious behaviors using ontology-based cognitive agents

agents are created in various labs so a total of 52 (8+44) and 78 (8+70) agents remain in the system for the next 49.56 and50.84 seconds respectively.

Whenever an illegal activity is captured, the agent count in both systems may increase due to the creation of Action agents. InScan 1, one action agent was created in ABSAMN, as it was able to capture one illegal activity where as five actions were created inCMBCA because it captured five illegal activities. Similarly, ABSAMN was unable to find any illegal activity in Scan 3 whereasCMBCA captured six illegal activities as shown in Fig. 11. Agent population for CMBCA is little high as compared to ABSAMN but itis ignorable because the efficiency and accuracy of the CMBCA is far better than ABSAMN. In both systems, when one round ofmonitoring is completed, agent count reduces to 8 again and after a delay of 120 seconds (configurable) a second round ofmonitoring is started and again the agent count increases to 52 and 78 respectively.

Fig. 12 shows the comparison of ABSAMN and CMBCA monitoring time on varying size of labs. If the lab has 20 PCs, ABSAMNand CMBCA complete one round of monitoring in 50.81 then 51.99 seconds respectively. If the size of the lab is increased from 20to 50, one round of monitoring will take 51.24 and 52.92. The difference between ABSAMN and CMBCA scan time for different sizeof labs is very small and ignorable because the maximum monitoring time difference is 2.59 seconds for lab of 300 PCs.

ABSAMN and CMBCA both help the administrator to view the top violated processes in a pie chart or bar graph generateddynamically as shown in Fig. 13. ABSAMN process share is not accurate because it totally relies on process name. This is why it isunable to capture all instances of illegal process. On the other hand, CMBCA captures all instances of the illegal process because itmonitors the application behavior. These statistics help the network administrator track and un-install illegal processes from thenetwork.

5. Conclusion

Today every organization big or small uses computers to manage/share their data and resources by connecting computersby some sort of computer network. Computer networks have become very complex and usually network administrators usesome form of monitoring tool(s) available in the market to monitor the network. In this paper we have modified the existingABSAMN architecture and proposed a new system, namely, CMBCA which is underpinned by an ontology-based knowledge

Fig. 12. Comparison of ABSAMN and CMBCA Scanning Time for varying lab sizes.

Fig. 13. Violated process share comparison of ABSAMN and CMBCA.

54 U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 16: Categorization of malicious behaviors using ontology-based cognitive agents

base to predict unknown illegal applications based on known application behaviors. CMBCA is an intelligent multi agentsystem used to detect known and unknown malicious activities carried out by users over a network. CMBCA is fullyautonomous and once initialized monitors whole networks with the help of cognitive mobile agents. Future work involvesoptimizing some of the algorithms described in the paper and applying the results to other domains, including in the area ofassisted living.

References

[1] U. Manzoor, S. Nefti, An agent based system for activity monitoring on network – ABSAMN, Expert Systems with Applications 36 (8) (October 2009)10987–10994, http://dx.doi.org/10.1016/j.eswa.2009.02.060.

[2] Sergio Ilarri, Eduardo Mena, Arantza Illarramendi, Using cooperative mobile agents to monitor distributed and dynamic environments, Information Sciences178 (2008) 2105–2127, http://dx.doi.org/10.1016/j.ins.2007.12.015.

[3] L. Matthieu, W. Walter, Complex computer and communication networks, Computer Networks 52 (15) (October 2008) 2817–2818, http://dx.doi.org/10.1016/j.comnet.2008.06.001.

[4] N. Shah, R. Iqbal, A. James, K. Iqbal, Exception representation and management in open multi-agent systems, Information Sciences 179 (15) (July 2009)2555–2561, http://dx.doi.org/10.1016/j.ins.2009.01.034.

[5] C. Lee, C. Jiang, T. Hsieh, A genetic fuzzy agent using ontology model for meeting scheduling system, Information Sciences 176 (9) (May 2006) 1131–1155,http://dx.doi.org/10.1016/j.ins.2009.01.034.

[6] Umar Manzoor, Samia Nefti, Autonomous agents: smart network installer and tester (SNIT), Expert System with Application 38 (1) (January 2011) 884–893.[7] V.C. Storey, A. Burton-Jones, V. Sugumaran, S. Purao, A methodology for context-aware query processing on the World Wide Web, Information Systems

Research 19 (1) (March 2008) 3–25, http://dx.doi.org/10.1287/isre.1070.0140.[8] M.N. Huhns, L.M. Stephens, Personal ontologies, IEEE Internet Computing 3 (5) (1999) 85–87.[9] H. Derbela, N. Agoulminea, M. Salaünb, ANEMA: autonomic network management architecture to support self-configuration and self-optimization in IP

networks, Computer Networks 53 (3) (February 2009) 418–430, http://dx.doi.org/10.1016/j.comnet.2008.10.022.[10] Ijaz K. Summiya, U. Manzoor, A.A. Shahid, A Fault Tolerance Infrastructure for Mobile Agents, In: Proceeding of IEEE Intelligent Agents, Web Technologies

and Internet Commerce (IAWTIC 06) Sydney, Australia, 29 Nov – 01 Dec, 2006, 2006.[11] Mark Greaves, Victoria Stavridou-Colemen, Robert Laddaga, “Dependable Agent Systems, IEEE Intelligent Systems 19 (5) (Sep-Oct 2004).[12] Gerhard Weiss, Multiagent Systems A Modern Approach to Distributed Artificial Intelligence, The MIT Press Cambridge, Massachusetts London, England,

1999. Chapters: 1–4.[13] Umar Manzoor, Samia Nefti, Cognitive agent for automated software installation – CAASI, Lecture Notes in Computer Science 5736 (16–18 September, 2009)

543–552 (Chania, Greece).[14] Paessler — PRTG Network Monitor, http://www.paessler.com/prtg/ 2009.[15] Network Monitoring Tools, http://www.topology.org/comms/netmon.html 2009.[16] Nagios, http://www.nagios.org/ 2009.[17] YouHide, http://www.youhide.com/ 2009.[18] Java Agent Development Framework – JADE, http://jade.tilab.com/ 2009.[19] D. Milojicic, F. Douglis, R. Wheeler, Mobility: Processes, Computers, and Agents, ACM Press, New York, NY, 1999.[20] W. Ying, S. Dayong,Multi-agent framework for third party logistics in E-commerce, Expert Systemswith Applications 29 (2) (August 2005) 431–436, http://dx.doi.org/

10.1016/j.eswa.2005.04.039.[21] G. Nicholas, H. Stephen, S. Nigel, Agent-based Semantic Web Services, Web Semantics: Science, Services and Agents on the World WideWeb 1 (2) (February

2004) 141–154.[22] K. Rajiv, Z. Hong, R. Ramesh, Enterprise integration using the agent paradigm: foundations of multi-agent-based integrative business information systems,

Decision Support Systems 42 (1) (October 2006) 48–78.[23] Dong-Her Shih, Hsiu-Sen Chiang, Binshan Lin, Collaborative spam filtering with heterogeneous agents, Expert Systems with Applications 35 (4) (November

2008) 1555–1566.[24] Umar Manzoor, Samia Nefti, Silent Unattended Installation / Un-Installation of Software's on Network Using NDMAS – Network Deployment Using

Multi-Agent System, In: the proceeding of The Fourth European Conference on Intelligent Management System In Operations (IMSIO 2009), July 2009,UK, 2009.

[25] K. Rajah, S. Ranka, Y. Xia, Scheduling bulk file transfers with start and end times, Computer Networks 52 (5) (April 2008) 1105–1122, http://dx.doi.org/10.1016/j.comnet.2007.12.005.

[26] Qinglin Guo, Ming Zhang, A novel approach for multi-agent-based Intelligent Manufacturing System, Information Sciences 179 (18) (21 August 2009)3079–3090.

[27] Shu-Heng Chen, Computationally intelligent agents in economics and finance, Information Sciences 177 (5) (1 March 2007) 1153–1168.[28] Windows Sysinternals Process Explorer v11.33, http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx 2009.[29] Google Talk, http://www.google.com/talk/2009.[30] Protégé, http://protege.stanford.edu/2009.[31] Serkan Çankaya, Hatice Ferhan Odabaşıa, Parental controls on children's computer and Internet use, World Conference on Educational Sciences, Nicosia,

North Cyprus, 4–7 February 2009Procedia — Social and Behavioral Sciences 1 (1) (2009) 1105–1109.[32] KidsWatch — Easy Parental Control Software, http://www.kidswatch.com/ 2009.[33] PC Tattletale – Helping to keep your kids safe online, http://www.pctattletale.com/ 2009.[34] S. Nefti, M. Oussalah, Y. Rezgui, A modified fuzzy clustering for documents retrieval: application to document categorization, Journal of the Operational

Research Society 60 (3) (March 2009) 384–394.[35] Spector Pro, http://www.spectorsoft.com/2009.[36] Sentry Parental Controls, http://www.sentryparentalcontrols.com/ 2009.[37] R. Mihalcea, D. Moldovan, A method for word sense disambiguation of unrestricted text, In: Proc. of ACL '99, Maryland, NY, June 1999, pp. 152–158.[38] R. Mihalcea, D. Moldovan, An iterative approach to word sense disambiguation, In: Proc. of Flairs, AAAI Press, Orlando, FL, May 2000, pp. 219–223.[39] Spyware Doctor – PCTools, http://www.pctools.com/spyware-doctor/ 2009.[40] Spyware terminator, http://www.spywareterminator.com/ 2009.[41] WordNet, http://wordnet.princeton.edu/ 2009.[42] Anonymous Proxy Server — Browser9, http://www.browser9.com/.[43] V. Nastase, S. Szpakowics, Word sense disambiguation in Roget's Thesaurus using WordNet, In: Proc. of the NAACL WordNet and Other Lexical Resources

Workshop, June 2001, Pittsburgh.[44] J. Caãas, A. Valerio, J. Lalinde-Pulido, M. Carvalho, M. Arguedas, Using WordNet for word sense disambiguation to support concept map construction, In: 10th

International Symposium on String Processing and Information Retrieval, October 2003, Manaus, Brazil, 2003.[45] Fintan J. Costello, Tony Veale, Simon Dunne, Using WordNet to automatically deduce relations between words in noun–noun compounds, In: Proceedings of

the COLING/ACL on Main conference poster sessions, July 17–18, 2006, pp. 160–167, Sydney, Australia.

55U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56

Page 17: Categorization of malicious behaviors using ontology-based cognitive agents

Dr Umar Manzoor holds a PhD in Computer Science from Salford University. He is an assistant professor at the National University ofComputer and Emerging Sciences in Pakistan. His expertise is in Natural Language Processing, agent technology and InformationSystems. He has published extensively in these areas.

Professor Samia Nefti-Meziani is the Head of Research Centre for Autonomous Systems and Advanced Robotics at Salford University.She has expertise in Cognitive Robotics, Machine learning , Swarm intelligence, Fuzzy Data mining, Fuzzy control, Fuzzy clustering ,and Modelling. She has published extensively in the above areas. She has also completed successfully several EU and nationalresearch projects.

Professor Yacine Rezgui is the director of the BRE Institute of Sustainable Engineering in Cardiff University. He conducts research inthe application of information and Communication Technologies to the construction sector. He has completed seventeen UK and EC(Framework 4, 5, 6, 7 and eContent) research projects. He has published extensively in areas ranging from ontology engineering tovirtual enterprises.

56 U. Manzoor et al. / Data & Knowledge Engineering 85 (2013) 40–56