ISSN Print : 2519-5395 ISSN Online : 2519-5409 - PASTICpastic.gov.pk/downloads/PJCIS/PJCIS_V1_2.pdfISSN Print : 2519-5395 ISSN Online : 2519-5409. ISSN Print : 2519-5395 ISSN Online

ISSN Print : 2519-5395ISSN Online : 2519-5409

ISSN Print : 2519-5395 ISSN Online : 2519-5409

Pakistan Journal of Computer and

Information Systems

Volume. 1, No. 2

2016

Published by: Pakistan Scientific and Technological Information Centre (PASTIC), Islamabad, Pakistan

Tel: +92-51-9248103 & 9248104; Fax: +92-51-9248113 Website: www.pastic.gov.pk

Pakistan Journal of Computer and Information Systems This is the first issue of Pakistan Journal of Computer and Information Systems (PJCIS) published by Pakistan Scientific and Technological Information Centre (PASTIC). Pakistan Journal of Computer and Information Systems is a peer reviewed journal of Computer Science and Engineering. It is an open access online journal published twice a year. The mission of the journal is to provide a forum for researchers to debate and discuss interdisciplinary issues in Computer Science, Computer System Engineering and Information Systems. It invites and welcomes contributions in all areas of Computer Science, Computer System Engineering and Information Systems.

Acknowledgement

Pakistan Scientific and Technological Information Centre (PASTIC) is grateful to the following researchers/subject experts for carrying out review of the manuscripts and providing their valuable and critical suggestions on the research papers published in this issue of Pakistan Journal of Computer and Information Systems (PJCIS). PASTIC also acknowledges their contributions in bringing out this issue.

Dr. Asad Ullah Shah, International Islamic University, Malaysia

Dr. Nouman Qadeer Soomro, Mehran University of Engineering & Technology, (MUET), SZAB Campus, Khairpur, Sindh

Dr. Ali Daud King Abldul Aziz University Jeddah, Kingdom of Saudi Arabia

Dr. Syed Feroz Shah Mehran University of Engineering & Technology,(MUET), Jamshoro, Sindh

Dr. Muhammad Akbar Effat University, Jeddah, Kingdom of Saudi Arabia

Muhammad Usman, Information Technology Section, PASTIC, Quaid-i-Azam University Campus, Islamabad.

Dr. Mohammad Altaf Mukati, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Karachi.

Pakistan Journal of Computer and Information Systems

Patron

Dr. Muhammad Ashraf Chairman, Pakistan Science Foundation

Editor-in-Chief Dr. Muhammad Akram Shaikh

Director General, PASTIC

Editors Nageen Ainuddin

Muhammad Aqil Khan

Managing Editors Saifullah Azim

Dr. Maryum Ibrar Shinwari

EDITORIAL ADVISORY BOARD

Dr. Muhammad Ilyas

Florida Atlantic University, Florida, USA Dr. Khalid Mehmood

University of Dammam, Dammam, Kingdom of Saudi Arabia Dr. Muhammad Yakoob Siyal

Nanyang Technological University, Singapore Dr. Khalid Saleem

Quaid-i-Azam University, Islamabad, Pakistan Dr. Kanwal Ameen

University of the Punjab, Lahore, Pakistan Dr. Waseem Shahzad

National University of Computer and Emerging Sciences, Islamabad, Pakistan.

Dr. Farooque Azam National University of Sciences & Technology, Islamabad,

Pakistan Dr. Laiq Hassan

University of Engineering & Technology, Peshawar, Pakistan

Dr. Muhammad Younas Oxford Brookes University, Oxford, United Kingdom Dr. Anne Laurent University of Montpellier-LIRMM, Montpellier, France Dr. M. Shaheen Majid Nanyang Technological University, Singapore Dr. Nadeem Ahmad SEECS, National University of Sciences & Technology, Islamabad, Pakistan Dr. Amjad Farooq University of Engineering & Technology, Lahore, Pakistan Dr. B.S. Chowdhry Mehran University of Engineering & Technology, Jamshoro, Pakistan Dr. Jan Muhammad Baluchistan University of Information Technology, Engineering & Management Sciences, Quetta, Pakistan Dr. Muhammad Riaz Mughal Mirpur University of Science & Technology, Mirpur (AJ & K)

Pakistan Scientific and Technological Information Centre (PASTIC), Islamabad, Pakistan Tel: +92-51-9248103 & 9248104 ; Fax: +92-51-9248113

www.pastic.gov.pk : e-mail: [email protected]

Subscription Annual: Rs. 1000/- US$ 50.00 Single Issue: Rs. 500/- US$ 25.00

Articles published in Pakistan Journal of Computer and Information Systems (PJCIS) do not represent the views of the Editors or Advisory Board. Authors are solely responsible for the opinion expressed and accuracy of data.

Aims and Scope

Pakistan Journal of Computer and Information Systems, a bi-annual publication of Pakistan Scientific and Technological Information Centre (PASTIC), Pakistan Science Foundation is aimed at providing a platform for researchers and professionals of Computer Science & Engineering, Information and Communication Technologies (ICTs), Library & Information Science and Information Systems for sharing and disseminating their research findings and to publish their original and cutting edge research. This is a peer reviewed open access journal intended to publish high quality papers on theoretical development as well as practical applications in all fields of Computer Science and Engineering, ICT and Information Science. The journal also aims to publish new attempts on emerging topics /areas, reviews and short communications. The scope encompasses research areas dealing with all aspects of computing and networking hardware, numerical analysis and mathematics, software programming, databases, solid and geometric modeling, computational geometry, reverse engineering, virtual environments and hepatics, tolerance modeling and computational metrology, rapid prototyping, internet-aided design, information models, use of information and communication technologies for managing information, study of information systems, information processing, storage and retrieval, management information systems etc. Review articles are also published in the journal.

Pakistan Journal of Computer and Information Systems

Contents

Volume 1 Number 2 2016

1. A Study of Issues and Challenges in Cloud Computing Muhammad Rizwan Ahmad, Yasir Saleem, Sarfraz Asghar…………………………………….……..

1

2 Comparative Analysis of Machine Learning Algorithms for Binary Classification Sarish Abid, Basharat Manzoor, Waqar Aslam, Safeena Razaq………………………………………

15

3 Numerical Solution of Fisher’s Equation by Using Meshless Method of Lines Hina Mujahid, Mehnaz………………………………………………………………………………………

29

4. Numerical Approximation of Rapidly Oscillatory Bessel Integral Transforms Sakhi Zaman, Siraj-Ul-Islam……………………………………………………………………………….

43

5. Advance Persistent Threat Defense Techniques: A Review Murtaza Ahmed Siddiqi, Aziz Mugheri, Kanwal Oad……………………………………………………

53

PJCIS (2016), Vol. 1 No.2:1-13 A Study of Issues and Challenges

1

A Study of Issues and Challenges in Cloud Computing

MUHAMMAD RIZWAN AHMAD, YASIR SALEEM*, SARFRAZ ASGHAR

Department of Computer Science & Engineering, University of Engineering and Technology, Lahore

*Corresponding author’s email: [email protected]

Abstract

Information technology is growing precipitately that is increasingly changing every aspect of our life. Cloud Computing is a growing technology for delivering services through internet. It is a technology archetype that helps businesses and individuals to share various services in a consistent and cost-effective manner. In a cloud computing environment, one works with data and applications that are maintained and stored on shared machines that exists in a web-based environment rather than physically located in the home of a user or a corporate environment. This paper attempts to investigate the crucial threats and issues faced in cloud computing and to have better understanding of it along with a glimpse of challenges.

Keywords: Cloud Computing, Application Programming Interface (API),

Infrastructure, Virtual Machines

INTRODUCTION

Cloud computing is an informal term in IT industry. The concept of cloud computing is very common now a days which is very beneficial not only for industry but also for clients. This concept describes computing as a cloud, i.e. a large number of computers are connected with each other to communicate and share resource via network (i.e. internet) [1]. Cloud computing is an Internet-based computing and latest trend in IT world. Cloud computing implements distributed computing as in cloud we can communicate, use and run different resources such as applications and hardware resources, connected with each other at the same time. This term is also common to refer the network based services. These services use real server hardware. Then the server hardware uses hardware called virtual hardware which are software simulation running on one or multiple real hardware.

Cloud Computing is a new concept and considered as emerging technology in the

domain of information technology to provide services to the client and server using cloud (Internet). It has service oriented architecture with different infrastructure. This architecture reduces resources-overhead for the end-user and provides flexibility, which also reduces ownership cost and different other services. This term is popular because of its marketing as it provides services using client server software’s (client side) to a large group of users (worldwide) for sharing and communication [2-4].


2

The beginning of the term cloud computing is unclear. After the evolution and combination of different existing technologies of computing, this concept came into IT market [5, 6]. Cloud computing exhibits the following key characteristics: Agile (accept and response to change), Application Programming Interface (API), Scalability with Elasticity, Expenditure, Machine and Location Independence, Multi-tenancy [7], Security, Virtualization of Hardware (machine), Performance and Maintenance.

There are a lot of problems that are faced in cloud computing such as threats and

opportunities of the cloud, privacy of data and information, compliance, legality of data, open source (software), de facto standards, services security, information abuse, domination and privatization of internet and services, governance of IT, storage of data, ambiguity of terminology and noisy neighbors. This paper has elaborated above mentioned threats and discussed the solutions available. Future opportunities have also been discussed.

Objective/Goal

In this paper, our primary goal is to understand the concept of cloud architecture, and its services regarding network. Secondary goal is to study the encroachment with Issues and Challenges of Cloud Computing.

To achieve these goals, we must:

Understand the Cloud Computing concept and its different services.

Familiarize with Cloud Infrastructure types.

Know about Cloud Architecture types (Hosted and Bare-metal).

Main objective of cloud computing is that user who have the need to know about each and every thing of expertise and technology gets benefit without knowing all of this. The cloud computing inspires users (clients) by cost cutting and also by facilitating the users without having any difficulty in IT resources [8]. Reliable and secure network connection is needed for this purpose which involves protection of data, its integrity, usability and security. We must be aware of the threats which cause these issues to get the network security.

Paper Overview

The paper is organized in four sections. The first section includes introduction. The next section of literature review gives background information relevant to the paper, such as the Cloud Computing and its different infrastructure techniques; an outline of Cloud Architecture is briefly quoted, with its different types. The subsequent Section describes overview of Cloud Computing issues and challenges due to constant change in technology. The last Section briefly concludes the evaluation of the networking with Cloud Computing and its issues and challenges.


3

LITERATURE REVIEW

In this section, we are going to discuss major concepts of Cloud Computing, architecture, its basics, emerging of this concept.

Due to the high demand of challenges of internet technologies and computer

applications in the future, IBM and Google (two computing companies) brought the concept of cloud computing to the public in October 2007 [9]. Cloud computing is defined from different world, but it is still in the discovering stage [10]. National Institute of Standards and Technology (NIST) model defines Cloud Computing Services for various clients, which allows the sharing of many computing resources. NIST presents a range of basic services provided by cloud computing, which includes software, platform and infrastructure [11].

Armbrust et al., define cloud computing, which highlights the importance of services in cloud computing as the services through the delivery of both applications and system software in the data center [10]. Cloud computing clouds include both the hardware and software system in Data center [12, 13]. Buyya et al., illustrates the cloud computing as the difference of cloud computing from the cluster and grid computing standards. Cloud computing cannot be taken as a simple association of the cluster and grid computing calculation, there is a new generation of data centers, that highlights virtual nodes in the system [14].

Vouk et al., demonstrated that cloud computing will be the next in developments on-

demand IT services and products that can be employed through the service-oriented architecture (SOA) [6]. Relationship between Cloud Computing and SOA is described by Linthicum [12]. In principle, SOA technology that uses cloud computing facts provides IT resources. Therefore, on the basis of previous research on Cloud Computing, researchers gave the clear definition of cloud computing.

Cloud computing is the exchange of services and information, both on the Internet

and Intranet. Customers can decide what information or services that you want to use, depending on the customer receivables. He also summarizes seven branches of cloud computing including storage, database, information, process, application, platform, and integration as a service [5]. Miller suggests that the cloud computing is a form of distributed computing that is more useful for sharing resources and for collaborating in work group, it is task centric and user centric [9].

CLOUD COMPUTING

In science, cloud is a term to indicate large collection (quantity) of objects, these objects appears virtually as a cloud from a distance (i.e. virtualization) (Figure 1). It expresses these objects which can’t be inspected further in certain perspectives. It is the advancement and espousal of on the hand computing technologies.


4

Figure 1: Cloud in General

In a cloud computing environment application(s), work, data storage and maintenance all done at back-end, (i.e. web) instead of home or a corporate environment, for individuals and businesses on shared machines [15]. Cloud Computing means sharing of application services on internet and to be able to access a wide range of services with the greater use of the Internet. For example, mails can be accessed anywhere in the world in real time from a machine connected with an Internet-connection, as web-based applications.

The term Cloud was used to symbolize the Internet. In the start, it was used as

standardized shape to symbolize a network on telephony schematics (cloud-like) but after that in 1994 used to represent the Internet in computer network diagrams [16]. The term became popular in 2006, when Amazon introduced the Elastic Compute Cloud.

The significance of cloud computing can be represented by its primary characteristics.

Following five are the key characteristics of cloud computing which aims to use clouds seamlessly and transparently:

1. On-demand service 2. Global network access 3. Measured service 4. Elasticity 5. Resource pooling (Location independent)

CLOUD INFRASTRUCTURE

Cloud infrastructure also known as Deployment Model of a cloud. Cloud computing have different types of infrastructures (Figure 2):

1. Private Cloud 2. Public Cloud 3. Hybrid Cloud 4. Community Cloud 5. Distributed Cloud


5

Figure 2: Cloud Computing Infrastructure Types

1. Private Cloud Private Cloud infrastructures are exclusively designed for an organization,

whether controlled by the organization or (under third-party control) and hosted either internally (by organization) or externally (third party) [17]. It requires a considerable involvement to virtualize the business. It also involves some decisions to reconsider about existing resources. There are some security issues which must be catered to avoid severe risks. Somehow it is considered as a safer model in terms of security but more costly option.

2. Public Cloud

In a Public Cloud services are open for everyone who is a part of that network. There may be no or some differences between public and private cloud infrastructure, but when a user have to select services, security is the point where user considerations are different. Any kind of services that are made available by a service provider for its network user either be private or public. Normally, public cloud service providers have their own infrastructures and operate it simply by using Internet (direct connectivity is not allowed) [18]. 3. Hybrid Cloud It is a framework of two or more clouds (private, community or public). Each cloud is connected with other one but persists as distinctive structure to provide the benefits of these multiple structures altogether. This service allows cloud service providers (organizations) to use resources (publically available) to meet short-term organizational requirements. It also facilitates the deployment of applications through Cloud Burst and it lacks the flexibility, security and certainty of home applications. 4. Community Cloud A cloud infrastructure which is shared between multiple users (individual/organizations) who have same concerns (security, compliance, etc.).


6

5. Distributed Cloud Cloud that provides connection between multiple machines (distributive) uses a single network running at different locations. Examples of distributed cloud are distributed computing platforms such as BOINC.

CLOUD MODEL SERVICES There are three service model architectures available in cloud computing. Figure 3 shows the interaction of these three architectures with each other and with the client. Following are the three models:

1- Software as a Service (SaaS) 2- Infrastructure as a Service (IaaS) 3- Platform as a Service (PaaS)

Organization/User/Clients

Web Browser, Apps. Client terminals, etc

SaaS Email, Games

Communication, CRM, etc

PaaS Database, Development Tools Runtime execution, server etc

IaaS Storage, Load Balance, Networks,

Servers, Virtual Machines etc

Figure 3: Cloud Model Services

1) Infrastructure as a Service (IaaS)

Infrastructure as a Service (IaaS) is the base of cloud services as it provides clients access to server hardware (as virtual machines VMs). It is also responsible to provide storage, bandwidth and other essential computing resources to the clients. The application resides on the virtual hardware with its virtual operating system (Figure 4). Some critical areas create issues in e.g. trusting the virtual hardware image etc.


7

Figure 4: Infrastructure as a Service (IaaS) Architecture

2) Platform as a Service (PaaS) Platform as a Service (PaaS) builds upon IaaS and gives computing infrastructure without any need to buy and manage it (Figure 5). It also provides access of the software’s and services to the clients to develop (enables programming environments) and use the applications (software).

Figure 5: Platform as a Service (PaaS) Architecture 3) Software as a Service (SaaS)

IaaS and PaaS provide the elementary architecture to Software as a Service (SaaS).

This service enables clients to access software applications (as on demand service) with integrated. In this architecture, client gets hold and use software components from different providers (Figure 6). Protection of the information and secure connection for these composed services are major issues in SaaS.

Cloud Infrastructure

IaaS

Infrastructure as a Service (IaaS)

Platform as a Service (PaaS)

[[Cloud Infrastructure Cloud Infrastructure

PaaS IaaS

PaaS


8

Figure 6: Software as a Service (SaaS) Architecture

CLOUD ARCHITECTURE

The Cloud Computing Architecture consists of On-premise resources (which are installed and managed locally by individual/organization), these resources also known as SaaS, Middleware (software that connects computers and devices to other applications), Software components and their location. It usually involves multiple cloud components over a loose coupling mechanism communicating with each other for example a messaging queue [18].

Figure 7: General Cloud Architecture

Business parties must identify each individual requirement of their application. If

organizations already have a cloud platform then they must understand the corresponding requirements for maintenance of existing cloud.

A cloud architecture has two major parts, one is a front end and other is back end

(Figure 7 & 8). The connection between front end and backend is through a network as a cloud, generally the Internet. Computer users are mostly at the front end in architecture. The back end in architecture is known as cloud or Data Store, where large data processing is done.

In Cloud Computing, the front of the system contains devices for client usage to

access the network or may be a computer network to communicate with cloud. Some application software’s are also needed to access the system. But it’s not the same interface for

Software as a Service (SaaS)

[[Cloud Infrastructure Cloud Infrastructure

SaaS

PaaS

SaaS

Cloud Infrastructure

IaaS

PaaS SaaS


9

all users of cloud computing system. For example email programs use existing web browsers such as Google, Microsoft’s IE. etc. Some unique applications are used for other types of systems which provide network access to their clients.

Figure 8: High level Cloud Architecture

When the clouds are combined together, they make a Cloud Computing System. Generally, there is an individual dedicated server for services of each application. For system administration (all individual servers), client demand and traffic monitoring, there is a central server. It ensures that every system component runs effectively. Middleware allows computers to communicate with each other that are connected on networks (cloud) [17]. For a large number of customers in cloud computing, service providers need a huge storage space to tackle all demands.

Via VPN Via Internet

For both Internet VPN


10

ISSUES AND CHALLENGES

A - Issues 1. Attacks That Target Shared Channel

The software implementation of a machine is called a Virtual machine that runs its own OS. Software application on different OS environment can simultaneously run multiple VMs. Data can be hosted on different VMs in a shared cloud media from different sources but located on an independent physical server. Software applications running on one VM doesn’t impact software running on another VM. This ensures maximum flexibility. In a recent study, it is possible to locate the internal cloud framework and map where a particular virtual machine resides. The results from this study may only be proof of concept at this stage but it is likely possible that cloud servers being a central point of susceptibility can be maneuvered criminally. 2. Data and Service Availability

A major liability in cloud computing environment is failure of internet connectivity as most organizations are reliant on the internet access to their shared and collective data. In addition if susceptibility is indicated in a cloud environment, the business might end all connection to the cloud service provider until the threat is eliminated. 3. Compatibility

Compatibility is another major issue in cloud computing. Different services are being provided by different dealers that may not be compatible with each other, thus making it difficult for the end user to switch the vendor. Constant changes and frequent improvements are likely to occur in cloud computing and businesses must keep themselves updated to ensure data integrity and security. These changes will be impacting both security and software development life cycle.

B - Challenges

Due to the immaculate use of computer, storage access, data security and data communication has become of high importance. The external communication of cloud is similar as any other communication over the Internet [14]. Technologies that ensure specific control policies should be used to protect it [5]. There are many problems that show inimical impact on cloud computing with respect to security.

An implementation of cloud computing infrastructure means storing hypercritical data

in hand of a third party. This is important to ensure the data security. Data is encrypted all the times with clearly defined tasks. The only way to ensure the encrypted data’s confidentiality on a cloud storage server is that the user/client can administer the encryption process.


11

Privacy is one of the core issue that is faced in all the challenges that includes the urge to protect the confidential information. System must ensure the data confidentiality as the companies doing large scale business would not be preferring to do the transactions of data through cloud servers which involve the interference of another system.

Data integrity ensures the validation of data and protecting the data from getting

deleted or corrupted. It ensures that only authorized users should be able to access the data. There is no inclusive practice that guarantees data security and finally it leads to the trust among the users [18]. In cloud-based storage, data may be scattered across multiple servers and locations. The user loses control over his data and is unable to inspect the data links visually [19] . The cloud data must be accessible to only those who are authorized, making it critical and thus monitoring who is accessing the data via cloud. To ensure the user authentication integrity, data access logs needs to be maintained to verify the authorized users accessing the data.

By the usage of cloud computing services, one can easily have access to the information stored on the shared medium and make it accessible to different services across the internet. An identity management system can help authenticating the users and services that are based on credentials. A major issue in this approach is interoperability problem that may result from using different identity credentials and negotiation protocols. Current authentications that are based on passwords have different drawbacks and reflect notable risks. An identity management system should be capable to protect the users and services private information.

CONCLUSION

The key motivation for writing this paper is to have a glimpse of cloud computing as an emerging technology in the new era. It can be used to address tactical issues which IT industry faces like resource availability and reliability; data center cost, operational process evenness and also implies a design paradigm to construct computer software as a service, reduced information technology aerial and great flexibility. Cloud computing is a growing technology paradigm that most of the infrastructure and services industries are focusing to capture potential opportunities. This paper highlighted some of the issues faced by cloud computing and also discussed some challenges. It is important for the cloud computing to have standardized security measures.

REFERENCES [1] M. Carroll et al., "Securing Virtual and Cloud Environments," Cloud Computing and

Services Science, I. Ivanov, M. v. Sinderen, and B. Shishkov, Eds.: Springer Science & Business Media, 2012, pp. 73-90. [Online]. doi: 10.1007/978-1-4614-2326-3

[2] AWS. (19th March 2013). Amazon Web Services; What is Cloud Computing? Available: https://aws.amazon.com/what-is-cloud-computing/


12

[3] R. Baburajan, (24th August 2011). "The Rising Cloud Storage Market Opportunity Strengthens Vendors" infoTECH, Available: it.tmcnet.com

[4] K. Oestreich, "Converged infrastructure," CTO Forum, vol. 15, Available: http://www.thectoforum.com/content/converged-infrastructure

[5] K. Sistanizadeh et al., "Universal access multimedia data network," US Patent US5790548 A, Aug 4, 1998, 1998. Available: https://www.google.com/patents/ US5790548.

[6] M. A. Vouk, "Cloud computing-Issues, Research and Implementations," J. Comput. Inf. Technol., vol. 16, no. 4, pp. 235-246, 2008. doi: 10.2498/cit.1001391

[7] A. Singh and K. Chatterjee, "Cloud Security Issues and Challenges," J. Netw. Comput. Appl., vol. 79, no. C, pp. 88–115, Feb. 2017. doi: 10.1016/j.jnca.2016 .11.027

[8] J. Geelan, "Twenty one experts define cloud computing," Cloud Comput. J., vol. 4, pp. 1-5, 2009. Available: http://cloudcomputing.sys-con.com/node/612375

[9] P. Mell and T. Grance, "The NIST Definition of Cloud Computing," in "Recommendations of the National Institute of Standards and Technology," Computer Security Division, Information Technology Laboratory, National Institute of Standards and Technology 2011, vol. 145, pp. 7, Available: http://faculty.winthrop .edu/domanm/csci411/Handouts/NIST.pdf

[10] M. Armbrust et al., "A view of cloud computing," Commun. ACM, vol. 53, no. 4, pp. 50-58, April 2010. doi: 10.1145/1721654.1721672

[11] L. M. Vaquero et al., "A break in the clouds: Towards a cloud definition," ACM SIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp. 50-55, January 2009.

[12] D. S. Linthicum, Cloud Computing and SOA Convergence in your Enterprise: A Step-by-Step Guide. Pearson Education, 2009.

[13] M. Miller, Cloud computing: Web-based applications that change the way you work and collaborate online. Indianapolis, Indiana: Que publishing, July 2008.

[14] M. Ali et al., "Security in Cloud Computing: Opportunities and Challenges," Inf. Sci., vol. 305, pp. 357-383, 1 June 2015. doi: 10.1016/j.ins.2015.01.025

[15] F. Gens, "Defining Cloud Services and Cloud Computing",IDC exchange. 23 September 2008. 22 August 2010. Available: http://blogs.idc.com/ie/?p=190

[16] ENISA, "Cloud Computing: Benefits, Risks and Recommendations for Information Security," 2010, Available: https:// www.enisa.europa.eu/pub...../cloud....../at.../ full report

[17] D. de Oliveira et al., "Towards a Taxonomy for Cloud Computing from an E-Science Perspective," Cloud Computing: Principles, Systems and Applications, N.


13

Antonopoulos and L. Gillam, Eds., London: Springer Science & Business Media, 2010, pp. 47-62. [Online]. doi: 10.1007/978-1-84996-241-4

[18] R. Chakraborty et al., "The Information Assurance Practices of Cloud Computing Vendors," IT Prof., vol. 12, no. 4, pp. 29-37, Jul 2010. doi: 10.1109/MITP.2010.44

[19] F. Zafar et al., "A survey of cloud computing data integrity schemes: Design challenges, taxonomy and future trends," Comput. & Secur., vol. 65, no. C, pp. 29-49, March 2017. doi: 10.1016/j.cose.2016.10.006

PJCIS (2016) Vol. 1, No. 2: 15-27 Comparative Analysis of Machine

15

Comparative Analysis of Machine Learning Algorithms for Binary Classification

SARISH ABID *, BASHARAT MANZOOR, WAQAR ASLAM, SAFEENA RAZAQ

Department of Computer Systems Engineering, Mirpur University of sciences and technology, Mirpur, AJK

*Corresponding author’s email: [email protected]

Abstract

Machine learning algorithms are applied in all domains to achieve classification tasks. Machine Learning is applicable to several real life problems. Aim of this paper is highly accurate predictions in test data sets using machine learning methods and comparison of these methods to select appropriate method for a particular data set for binary classifications. Three machine learning methods Artificial Neural Network (Multi-Layer Perceptron with Back Propagation Neural Network), Support Vector Machine and K-Nearest Neighbor are used in this research work. The data sets are taken from UCI website. A comparative study is carried out to evaluate the performance of the classifiers using statistical measures e.g. accuracy, specificity and sensitivity. These results are also compared with previous studies. Experimental outcomes show that the Artificial Neural Network method provides better performance, and it is strongly suggested that the Multi-Layer Perceptron with Back Propagation Neural Network method is reasonably operational for the task of binary classification followed by Support Vector Machine and K-Nearest Neighbor.

Keywords: Artificial Neural Network, Classification algorithms, K-Nearest

Neighbor, Machine Learning, Binary classification, Support Vector Machines

INTRODUCTION

Machine learning is a type of learning in which a system can learn with some input data and tries to improve its performance that is measurable with the help of some measures. The motivation of using machine learning is as follows: There are many tasks that require an adaptive system that can learn e.g. handwriting recognition, speech recognition etc. Learning is also useful as an alternative to hand-coding a program. For example, if one wants to develop a program which can play the game of chess, he can hand code all the rules or conditions necessary to play chess game. An alternative way of writing such programs would be to provide a system with the database of chess game and their outcomes. The system can apply learning methods to learn to play a good game of chess without explicitly telling the system which move to take in which situation. But provision of a large database is necessary from which the system can decide. Provision of database is usually easier than hand-coding the rules. So one can save a lot of manual effort if system is able to learn itself. Different types of machine learning methods can be used in variety of ways e.g. supervised learning,


16

unsupervised learning and reinforcement learning. In our research work, we have used supervised learning in which a system is given labeled training examples (inputs+ outputs).

Classification, also called concept learning, consists of learning a description of a

class of objects. This description is typically used to predict whether new objects fit the class or not. It has been of keen interest for researchers of computer engineering. In mammogram analysis: we are given a mammogram, and we want to classify the mammogram as normal, cancerous or pre-cancerous and in document understanding: we are given a rectangular region from a scanned image (region) and we want to be able to say whether this is a text region or graphic region.

Numerous classification techniques have been used in past for classification of

SPECT heart, diabetes, banknote authentication, liver disorder, and radar signals returned from ionosphere data in the literature. For diabetes detection, Deng and Kasabov used 10-fold cross-validation (FC) using ESOM and attained classification accuracy of 78.4% [1]. Aslam and Nandi used genetic programming (GP) and a variation of genetic programming and gained classification accuracy of 78.5±2.2% [2]. The classification algorithms Naïve Bayes classifier, C4.5, BPNN algorithm, and SVM for the classification of some liver patient datasets were evaluated by Ramana [3]. Gulia et al., also evaluated the classification algorithms Multi-Layer Perceptron, SVM, Random Forest and Bayesian Network for the classification of liver Patient datasets [4]. SVM algorithm is considered as the better performance algorithm, because it gives higher accuracy irrespective of other classification algorithms before applying feature selection. Random Forest algorithm outperformed all other techniques with the help of feature selection with an accuracy of 71.8696%. Asadi et al., used a new Supervised Feed Forward Multi-layer Neural Network (SFFMNN) model [5]. In SPECTF Heart, the accuracy of Clip3 and Clip4 was 77%. Shao et al., obtained 81.3874.59% accuracy [6]. Radar target identification is a very rarely explored domain. Samb achieved prediction rate of 86.29% on ionosphere dataset [7]. Pujari and Gupta obtained a greater accuracy of 93.84% using ensemble model with feature selection [8]. Mohamad et al., used the technique of Artificial Neural Network using back-propagation algorithm for classification of banknote authentication dataset [9]. Ghazvini et al., used Bank Notes dataset and compare Naïve Bayes and Multilayer Perceptron using the classification technique [10]. The results in his study were obtained using Naïve Bayes and Multilayer Perceptron with accuracy of 87.43 and 95.21% respectively.

Durodola et al., used Artificial Neural network algorithm for the prediction of damage

caused by random fatigue loading and get very encouraging results [11]. To model roadway traffic noise Hamad et al., used ANN algorithm and get good performance than other existing methods so far in the literature [12]. Evaluation of exhaust emission of the engine was carried out by Celebi et al., using ANN for the prediction of sound pressure level and vibration of the engine [13]. Results showed that generated model was capable for the estimation of parameters with high accuracy. Liu et al., used SVM algorithm for multi class sentiment classification and concluded that this method is significantly better than the others for multi class sentiment classification [14]. Cholette and Borghesani addresses the problem of estimating continuous boundaries between acceptable and unacceptable engineering design


17

parameters in complex engineering applications and used SVM algorithm for this purpose and get very promising results [15]. In order to predict the different types of road surfaces based on tire cavity sound acquired under normal vehicle operation Masino et al., applied SVM algorithm that is comprehensive and inexpensive provides good accuracy [16]. Chen and Hao used both SVM and KNN algorithm for stock market indices prediction [17]. Mohammed et al., used KNN for the solution of vehicle routing problem and get optimal route result [18]. Faziludeen and Sankaran used Evidential K nearest neighbour for ECG beat classification and proved that EKNN system outperforms [19]. Munisami et al., used KNN algorithm based recognition system capable of identifying plants by using the images of their leaves and get accuracy of 87% [20].

This research has been conducted with the objective to examine and evaluate the

existing machine learning techniques and find out which technique provides better results for which problem dataset. It presents comparative analysis to help researchers to solve the real world complex binary classification problems more efficiently. Complexity is also calculated and presented in terms of training time. The machine learning classifiers used in this research are evaluated by applying on real world classification problems of diabetes, liver and heart diseases detection, ionosphere data and banknote authentication classification. Some problem datasets had no missing values, and none of the feature values was categorical. Although several comparisons are available for these algorithms in previous studies but to the best of our knowledge these algorithms are being used for the first time for the problem datasets. These problems are very challenging in near future. More than several million people around the world are suffering from diabetes, liver disorders and heart disease. In immediate future large number of physicians would be needed if this rapid rise of diabetes carries on. Now, it is the need of time to use classifier system in medical diagnosis. If some attributes of a patient are available a classifier system won’t need physician to figure that person is affected or not. If assessments made by physician for past patients having same conditions are stored then a classifier system can be developed to make use of the stored conditions according to the stored assessments. This will benefit physician greatly keeping the significance of the expert opinion in disease diagnosis. In addition to these, counterfeiting has a past history and will continue in the future as well. All of us are being affected by the counterfeiting of banknotes. The reprographic technologies have been developed increasing the threat of counterfeiting. So a classifier system is also needed for banknote authentication. Machine learning algorithms used in this research are introduced briefly in next sections.

MACHINE LEARNING ALGORITHMS

K-Nearest Neighbor rule

In 1950, a new classification algorithm K nearest neighbor was introduced by Fix and Hodges [21]. This algorithm is pretty different from other algorithms because there is no training case at all, so the algorithm gets the training and testing set at the same time. Input is a set of training examples {xi,yi} where xi is the set of attribute value pairs from the ith instance and yi is the label and if we are doing the classification it is the class label {ham or


18

spam} or if we are recognizing the digits it’s the value between 0 to 9. Testing point is x that we want to classify. The algorithm works by taking the point x and computing the distance D(x, xi) to every training example xi. Out of those training examples it picks k instances which are closest to xi1 to xik and looks at their labels yi1 to yik and picks the label which is most frequent in that set of labels.

Support Vector Machines

In 1968, Vapnik and Chervonenkis introduced the concept of Support Vector Machines (SVMs) [22] . This algorithm can also be used for classification. Let’s say, for linearly separable binary sets we have a two dimensional plane with two classes of objects and we want to put a border between them. The goal of SVM is to design a hyperplane that classifies all training vectors in two classes. A hyperplane can be represented by a normal vector and a scalar. Normal vector determines the orientation. The bias, on the other hand controls the displacement from the origin. The margin can be described by using two hyperplanes, by changing the angle of normal vector, we can rotate the margin or if we want to shift, we can increase or decrease the bias. The best choice will be the hyperplane that leaves the maximum margin from both classes. The margin is the distance between the hyperplane and the closest elements from the hyperplane. so the hyperplane for which margin is higher is selected [23].

Artificial Neural Networks

Neural Network (NN) is a system which has been inspired biologically. In 1943,

McCulloch and Pits are generally recognized as the designer of the first neural network [24]. First learning rule for NN was devised in 1949 by Hebb [25]. Minsky and Papert published a paper in 1969 in which they highlighted the computational limitation of Perceptron unit [26]. This leads to a virtual decline in the research work in NN. Fortunately, in 1980’s called re-emergence of interest in NN & many researches came up with more complex architecture in the form of multi-layer networks. That overcame the limitation of Perceptron unit. Today research in the area of NN is active and these are being used in a variety of applications.

In Artificial Neural Network (ANN), we have a network of simple processing elements which are connected to each other via weighted links. Inputs are fed to the input unit and as a result computations done in this unit and the outputs are produced. NN has variety of applications. It has been used for recognizing hand-written letters, for predicting online the quality of welding spots, for identifying relevant documents with in a carpus (large no of documents), for visualizing high-dimensional of space then tracking online the position of robot arms.

It can be divided into three main types depending upon how an ANN partitions the data into different classes: Multi-Layer Perceptron (MLP), Radial Basis Function Network (RBF) and Probabilistic Neural Network (PNN). In this research only the first type is used and interested readers can read details about other types in [27] .


19

Back propagation Neural Networks (BPNNs) is based upon a technique that works

using supervised learning called Back propagation learning. Commonly it is named as the Feed Forward Back Propagation Neural Network (FFBPNN). With respect to architecture it is mainly a Multi-layer Perceptron. The BPNN was the gemstone that charmed and fascinated researchers and revealed the true influence of NN. It opened research flaps with never-ending opportunities in numerous fields of sciences, engineering and statistics; and it is computationally efficient. But on the darker side the Back propagation NN has also been named as the ‘black box’ (as we cannot interpret easily the rules that NN learns) as it has a fixed algorithmic operation only with no fixed topology (number of neurons and nodes used) for it. Irrespective of all these aspects overall the BPNN is relatively accurate and easy to work with respect to other neural networks.

EXPERIMENT

This section firstly describes the datasets used for experimental work. Secondly it

explains the experiments conducted to solve the problems.

Dataset Description

Table of examples or instances are used to represent the data in supervised machine learning task. Fixed numbers of measurements, or features, are used along with a label that denotes its class to describe each instance. Features which are also called attributes are of two types namely nominal and numeric. Nominal attributes are unordered categories. Numeric data consists of real numbers. Two sets of examples are required for application of a machine learning algorithms i.e. training and test examples. Learned concept descriptions are produced by use of training examples set. To evaluate the accuracy, test examples set is required. Class labels are missing in testing phase. A class label is produced as an output when the algorithm is applied with the test example as input. In this paper, all the problem datasets are taken from UCI machine learning repository Blake and Merz [28] and details are given in Table 1.

Table 1: Dataset Details

Dataset Name Total Samples

Training Samples

70%

Testing Samples

30% Features

Ionosphere 1352 246 105 34 PIMA 768 538 230 8 SPECT Heart 267 187 80 44 Banknote Authentication 1372 960 412 4 ILPD 583 409 175 10


20

Performance Measurement

Three statistical measures are used to calculate the performance of each classification

method i.e. Specificity, sensitivity and accuracy. (TN) true negative, (TP) true positive, (FN) false negative and (FP) false positive cases are used to describe these measures. Suppose, we take a test of some people for verification of some disease. The term true positive is for the case if the test results are positive and the resultant people have the disease. False negative is for the case when some of them are infected with the disease but test results show they are clear. The term true negative is for the case if the test results are negative and the resultant people are not affected with that disease. At last the people who are not affected with the disease and are healthy but test results is positive, is termed as false positive. FN, FP, TN, TP cases are shown in Table 2.

Table 2: Confusion Matrix for Actual and Predicted Cases

P’(predicted) N’(predicted)

P(Actual) True Positive False Negative N(Actual) False Positive True Negative

Specificity The capability of the system of predicting the accurate values for the cases that are the opposite of the desired one is called as specificity. In short, it measures the proportion of the true negatives. Specificity can be calculated using the equation below:

SPEC = Negative hits / Total negatives = TN/ (FP +TN) (1)

Sensitivity

The capability of the system on predicting the accurate values in the cases presented is called as sensitivity. In short it may be defined as the measures the proportion of the true positives. Sensitivity can be calculated using the equation below.

SENS = Positive hits / Total Positives = TP / (FN+TP)(2)

Classification Accuracy

Considering the positive and the negative inputs classification accuracy measures the proportion of correct predictions. Classification accuracy is dependent on the data set


21

distribution, which can lead to incorrect conclusions regarding the system performance. Classification Accuracy can be calculated using below equation:

ACC=Total Hits /Total Number of entries in the set = (TP+TN)/ (P+N)(3)

RESULTS AND DISCUSSION

The experimental work is carried out on core i3 with 2GB RAM on windows platform using MATLAB R2011. In order to divide the data of the dataset we used 70: 30 % ratios respectively for training and testing. Training data is used to train the classification algorithms. Testing data is used to calculate the strength or proficiency of classification algorithms. Outcomes taken from classification algorithms are matched with true classes to distinguish true positives, true negatives, false positive and false negative values. We compute these values to build the confusion matrix [29]. Each cell contains the row number of samples classified for the corresponding combination of desired and actual model output. While studying the performance of each classification algorithm, all results are calculated over 100 runs. Training time is averaged for these classifiers. We checked the results for ANN for different sizes of hidden layer. We used Euclidean distance in order to compute the distance in K-nearest neighbor algorithm. For support vector machine algorithm, we checked the results for three different kernel functions like Polynomial, RBF, MLP with varying parameter values like polynomial order, value of sigma in RBF and MLP parameter.

Comparative Analysis

Tables 3, 4 and 5 show the results of classifiers:

Table 3: ANN Results

Dataset Name Accuracy (%)

Sensitivity (%)

Specificity (%)

Ionosphere 100 100 100 PIMA 91.73 87.7 94.28 SPECT Heart 100 100 100 Banknote Authentication 100 100 100 ILPD 90.85 93.79 82.60

Table 4: SVM Results


Sensitivity (%)

Specificity (%)

Ionosphere 92.38 97 87 PIMA 75.21 62 81 SPECT Heart 81.25 77 90 Banknote Authentication 100 100 100 ILPD 66.66 75 45


22

Table 5: KNN Results


Sensitivity (%)

Specificity (%)

Ionosphere 78.095 60 92 PIMA 78.26 48 92 SPECT Heart 77.50 100 14 Banknote Authentication 100 100 100 ILPD 72 90 27

From Table 3, 4 and 5, it is clear that all classifiers achieve 100% accuracy for

banknote authentication dataset. These results are also consistent with previous findings [9]. It may be because of smaller feature set containing all the information required to discriminate genuine and forged banknotes. In addition to banknote authentication dataset, ANN also achieves 100% accuracy for SPECT Heart and Ionosphere dataset.

SVM is the second best for SPECT Heart and ionosphere dataset and it achieves

81.25% and 92.4% accuracies respectively while for the same datasets, KNN is able to achieve only 77.5% and 78.11% accuracies respectively. Both of these datasets have large number of features and the results shows that ANN handles these large datasets quite easily while SVM and KNN fail to do that.

ANN is still the best classifier for PIMA and ILPD datasets as it outperforms the other

two classifiers. The performance of other two classifiers is not the same, where KNN performs better than SVM. The reason for low performance of PIMA and ILPD datasets for all the classifiers are that, these algorithms do not handle missing values effectively.

Moreover, ANN not only achieves better accuracy compared to other classifiers, it

provides desirable stability between sensitivity and specificity while other classifiers fail to maintain that. These results prove that ANN is the best classifier for all these datasets.

General Comparison

An overall comparison of classifiers is presented as bar graph in Figures 1-3.

According to this comparison, ANN is optimal classifier for binary classification problems in all aspects including accuracy, sensitivity and specificity irrespective of any particular problem dataset. Thus we can say that ANN is the best classifier for binary classification problems.


23

Figure 1: Accuracy Comparison for ANN, SVM and KNN

Figure 2: Sensitivity Comparison for ANN, SVM and KNN

Figure 3: Specificity Comparison for ANN, SVM and KNN

In Table 6, complexity of classifiers in terms of training time is presented. Results show that classifiers used in this research are not complex. Based on training time ANN is

020406080

100

Ionosphere PIMA Spect heart Banknote ILPD

Accuracy Comparison

ANN SVM KNN

0

20

40

60

80

100


Sensitivity Comparison

ANN SVM KNN

0

20

40

60

80

100


Specificity Comparison

ANN SVM KNN


24

complex and KNN is the simplest among these classifiers while SVM shows very high complexity only for liver patient dataset and for rest of dataset, its complexity is low.

Table 6: Complexity Comparison

Dataset Average

time for ANN(s)

Average time

for SVM(s)

Average time

for KNN(s) Ionosphere 0.56 0.138 0.01 PIMA Indian Diabetes 4.87 0.30 0.027 Spect Heart 1.36 0.08 0.01 Banknote Authentication 0.5 0.39 0.022 ILPD 1.02 4.55 0.024

There are other methods which have been used for classification of these five datasets

in the past. We compare our results with the results obtained so far in the previous literature. Table 7 gives the classification accuracies of previous methods where classification accuracy represents the percentage of instances correctly classified using test data.

Table 7: Comparison with Previous Work

Dataset Algorithm Accuracy (%) Ref. No

PIMA Indian Diabetes ILPD

GP Random forest

78.5 71.86

[2] [4]

SPECT Heart Ionosphere

Clip 3,4 Ensemble

77 93.8

[5] [8]

Banknote Authentication Naive bayes 87.9 [10]

Our Results Accuracy (%) ANN SVM KNN

PIMA Indian Diabetes

ILPD

SPECT Heart

Ionosphere Banknote Authentication

91.73

90.85

100

100 100

75.21

66.66

81.25

92.38 100

78.26

72

77.50

78.095 100

It is clear from Table 7, KNN algorithm produced similar results as in previously

reported datasets i.e., PIMA Indian Diabetes, Indian liver Patient and SPECT Heart


25

datasets. SVM algorithm also produces similar results but only for ionosphere dataset. Apart from this, ANN results are much better than the results produced by SVM and KNN algorithms.

CONCLUSION

On the basis of several experiments, SVM and KNN algorithm show variation in results for different problem datasets due to size and attributes. Algorithm which performs better in terms of sensitivity and accuracy rate over a problem dataset has been considered as the best classification algorithm for that problem dataset. From results, it can be concluded that ANN is suitable for given tasks. ANN Classifier is optimal classifier for classification of all the datasets. Overall, ANN has achieved remarkable performance with highest accuracies followed by SVM and KNN. In all respects, ANN performs better, hence ANN is recommended for binary classification irrespective of any problem dataset.

As far as complexity is concerned, as mentioned in Table 6, on average ANN takes maximum five seconds, SVM takes less than five seconds and KNN takes only one second. It can be concluded that KNN performs faster and is not as complex as ANN and SVM. ANN’s complexity increases with increasing hidden layer size that’s why it takes more time in training. Overall training time for these classifiers is five seconds which is negligible.

RECOMMENDATIONS

For future work, it is suggested that due to excellent performance on all five datasets

in this research, one can apply ANN on other binary classification problems and strengthen our conclusion after evaluation. The parameter response is not clear and can be further explored. While working with ANN, it is found that all the datasets perform differently for specific hidden layer size so we can’t say that which size is good for all datasets. This area can be further explored. SVM is good for binary classification. We used three kernel functions in this research, so other functions can be explored. Euclidean distance is used in our study, so we suggest that other distances can be used in future research work.

REFERENCES [1] D. Deng and N. Kasabov, "On-line pattern analysis by evolving self-organizing

maps," in Proceedings of the Fifth Biannual Conference on Artificial Neural Networks and Expert Systems (ANNES 2001), 2001, pp. 46-51. doi: 10.1016/S0925-2312(02)00599-4

[2] M. W. Aslam and A. K. Nandi, "Detection of diabetes using genetic programming," in 18th European Signal Processing Conference (EUSIPCO-2010), Aalborg, Denmark, August 23-27, 2010, pp. 1184-1188


26

[3] B. V. Ramana et al., "A Critical study of selected classification algorithms for liver disease diagnosis," Int. J. Database Manag. Syst., vol. 3, no. 2, pp. 101-114, May 2011. doi: 10.5121/ijdms.2011.3207

[4] A. Gulia et al., "Liver patient classification using intelligent techniques," Int. J. Comput. Sci. Inf. Technol., vol. 5, no. 4, pp. 5110-5115, 2014. doi: 10.1.1.443.8662

[5] R. Asadi et al., "New Supervised Multi Layer Feed Forward Neural Network Model to Accelerate Classification with High Accuracy," Eur. J. Sci. Res., vol. 33, no. 1, pp. 163-178, 2009. doi: 10.1.1.454.8820

[6] Y. H. Shao et al., "Least squares recursive projection twin support vector machine for classification," in Pattern Recognit. vol. 45, ed, 2012, pp. 2299-2307.

[7] M. L. Samb et al., "A novel RFE-SVM-based feature selection approach for classification," Int. J. Adv. Sci. Technol., vol. 43, pp. 27-36, 2012. doi: 10.1.1.641.826

[8] P. Pujari and J. B. Gupta, "Improving classification accuracy by using feature selection and ensemble model," Int. J. Soft Comput. Eng., vol. 2, no. 2, pp. 380-386, 2012. doi: 10.1.1.650.2904

[9] N. S. Mohamad et al., "Banknote authentication using artificial neural network," Sci. Int., vol. 26, no. 5, pp. 1865-1868, 2014. Available: https://www.researchgate.net/ publication/279205560

[10] A. Ghazvini et al., "Comparative analysis of algorithms in supervised classification: A case study of bank notes dataset," Int. J. Comput. Trends Technol., vol. 17, no. 1, pp. 39-43, 2014. doi: 10.14445/22312803/IJCTT-V17P109

[11] J. F. Durodola et al., "A pattern recognition artificial neural network method for random fatigue loading life prediction," Int. J. Fatigue, vol. 99, pp. 55-67, June 2017. doi: 10.1016/j.ijfatigue.2017.02.003

[12] K. Hamad et al., "Modeling roadway traffic noise in a hot climate using artificial neural networks," Transp. Res. Part D Transp. Environ., vol. 53, pp. 161-177, June 2017. doi: 10.1016/j.trd.2017.04.014

[13] K. Celebi et al., "Experimental and artificial neural network approach of noise and vibration characteristic of an unmodified diesel engine fuelled with conventional diesel, and biodiesel blends with natural gas addition," Fuel, vol. 197, pp. 159-173, June 2017. doi: 10.1016/j.fuel.2017.01.113

[14] Y. Liu et al., "A method for multi-class sentiment classification based on an improved one-vs-one (OVO) strategy and the support vector machine (SVM) algorithm," Inf. Sci., vol. 394-395, pp. 38-52, July 2017. doi: 10.1016/j.ins.2017.02.016

[15] M. E. Cholette and P. Borghesani, "Using support vector machines for the computationally efficient identification of acceptable design parameters in computer-aided engineering applications," Expert Syst. Appl., vol. 81, pp. 39-52, 15 September 2017. doi: 10.1016/j.eswa.2017.03.050


27

[16] J. Masino et al., "Road surface prediction from acoustical measurements in the tire cavity using support vector machine," Appl. Acoust., vol. 125, pp. 41-48, 2017.

[17] Y. Chen and Y. Hao, "A feature weighted support vector machine and K-nearest neighbor algorithm for stock market indices prediction," Expert Syst. Appl., vol. 80, pp. 340-355, 2017. doi: 10.1016/j.eswa.2017.02.044

[18] M. A. Mohammed et al., "Solving vehicle routing problem by using improved K-nearest neighbor algorithm for best solution," J. Comput. Sci., 19 April 2017. doi: 10.1016/j.jocs.2017.04.012

[19] S. Faziludeen and P. Sankaran, "ECG beat classification using evidential K-Nearest Neighbours," Procedia Comput. Sci., vol. 89, pp. 499-505, 2016. doi: 10.1016/j.procs.2016.06.106

[20] T. Munisami et al., "Plant leaf recognition using shape features and colour histogram with k-nearest neighbour classifiers," Procedia Comput. Sci., vol. 58, pp. 740-747, 2015. doi: 10.1016/j.procs.2015.08.095

[21] E. Fix and J. Hodges, "Discriminatory analysis-nonparametric discrimination: consistency properties," Technical report, USAF School of Aciation Medicine 1951, doi: 10.2307/1403797.

[22] V. N. Vapnik and A. Y. Chervonenkis, "On the uniform convergence of relative frequencies of events to their probabilities," Measures of Complexity: Springer, 2015, pp. 11-30. [Online]. doi: 10.1007/978-3-319-21852-6_3

[23] M. W. Aslam, "Pattern recognition using genetic programming for classification of diabetes and modulation data," Doctoral dissertation, University of Liverpool, 2013. doi: 10.1.1.427.1195

[24] W. S. McCulloch and W. Pitts, "A Logical Calculus of the Idea Immanent in Nervous Activity," Bull. Math. Biophys., vol. 5, pp. 115-133, 1943. doi: 10.1007/BF02478259

[25] D. O. Hebb, The Organization of Behavior: A Neuropsychological Theory, New York: Wiley & Sons, 1949. Available: https://books.google.com.pk/ books?id=ddB4AgAAQBAJ

[26] M. L. Minsky and S. Papert, Perceptrons: An Introduction to Computational Geometry, 2nd ed. Cambridge: MIT Press, 1972. Available: https://books.google.com.pk/books?id=U-O9BHlTvJIC

[27] S. S. Haykin, Neural Networks: A Comprehensive Foundation: Macmillan, 1994. Available: https://books.google.com.pk/books?id=M5abQgAACAAJ

[28] C. L. Blake and C. J. Merz. UCI repository of Machine Learning Databases [Online]. Available: http://archive.ics.uci.edu/ml/

[29] J. Han et al., Data Mining: Concepts and Techniques: Elsevier Science & Technology, 2011. Available: https://books.google.com.pk/books?id=pQws07tdpjoC

PJCIS (2016), Vol.1 No.2: 29-42 Numerical Solution of

29

Numerical Solution of Fisher’s Equation by Using Meshless Method of Lines

HINA MUJAHID*, MEHNAZ

Department of Mathematics, Shaheed Benazir Bhutto Women University, Peshawar 25000.

*Corresponding author’s e-mail: [email protected]

Abstract

Many problems in science and engineering field are modeled by Partial Differential

equations (PDEs). Non-linear, reaction diffusion equation, Fisher’s equation, models different problems in ecology, biology and mass and heat transfer. This paper concerns with the development of meshless method of lines (MOL) for solving Fisher’s equation. This method is applied in two steps. In the first step, Space derivatives are approximated by different radial basis functions. It results in conversion of PDE to system of ordinary differential equations (ODEs) which are then solved by Runge Kutta method of order 4 (RK4) in the second step. Finally LL ,2 and root mean square (RMS) error norms are hired to check the behavior of the method. The proposed method is compared with some available methods in the literature.

Keywords: Multi Quadric (MQ), Inverse Multi Quadric (IMQ), Gaussian (GA),

Inverse Quadric (IQ), Runge Kutta method of order 4 (RK4).

INTRODUCTION In applied mathematics and physics, nonlinear phenomena play an essential role. For

PDEs field of nonlinear control problems are the most mathematically challenging in the association of distributed parameter systems. Fisher’s equation was first introduced by Fisher to model the advance of freak gene in an infinite domain and is nonlinear evolution equation [1]. Furthermore, Fisher’s equation has been used as a basis to model the spatial spread of gene in chemical wave propagation, branching Brownian motion process, flame propagation, nuclear reactor theory [2-5], spread of invasive [6] , bacteria [7] , epidemics[8], and many other disciplines. The Fisher’s equation is defined by

),1(2

2

uuxuD

tu

(1)

Where tshows the time and ),( x shows the position. A reactive coefficient and diffusion coefficient D parameterized the reactive and diffusion process.


30

In this paper we have considered the following form of Fisher’s equation,

.0)1( uuuu xxt (2) Where is an arbitrary constant.

The exact solution of Eq. (2) is given in [2],

2

6.5)

6(

1

1),(

txe

txu

. (3)

Many properties of Fisher’s equation as a typical nonlinear reaction diffusion system

have been studied by many authors, including the singular property and the travelling wave behaviour [2, 9-12]. Excellence summaries of Fisher’s equation are provided by Kawahara & Tanaka [13], Baraznik & Tyson [14] and Larson [15]. The analytical study of Fisher’s equation by using Adomian decomposition method is discussed by Wazwaz & Gorguis [2]. For the generalized Fisher’s equation, the exact and explicit solitary wave solution has been presented by Wang [16]. The numerical solutions of Fisher equation were not present in the literature till 1947. First time Gazdag & Canosa [17], presented the numerical solution of Fisher’s equation with a pseudo-spectral approach. After that Fisher’s equation has been solved numerically by a lot of researchers. To discuss the numerical solution of Fisher equation, the implicit and explicit finite differences algorithms were presented by Twizell et al. [18] and Parekh & Puri [19]. Carey & Shen [20] used a least-squares finite element method. Comparison of nodal integral and non-standard finite schemes by Rizwan-Uddin [21] and Galerkin finite element method is proposed by Tang & Weber [22]. By using centered finite difference algorithm, the asymptotic boundary conditions are developed by Hagstrom and Keller [23]. A best finite-difference scheme for Fisher’s equation proposed by Mickens [24], a pseudo spectral method was proposed by Olmos and Shizgal [25], a moving mesh method is used by Qiu & Sloan [26]. Also by applying wavelet Galerkin method, Fisher’s equation is studied by Mittal et al. [27].

MOL is an efficient technique to find the numerical solution of PDEs. The main

theme of this method is to discretize the space derivatives and the time derivatives remain continuous. The German mathematician Erich Rothe introduced MOL in 1930 [28]. He applied it to parabolic type equations, but it can be used in broad sense. MOL is considered as special case of FDM but it is more effective than FDM due to its accuracy and less computational cost.

We first discretize the given PDE in space variable by approximating the spatial

derivatives by RBF/FDM and then solve the system of ODEs by any ODE solver. In recent years, different PDEs are solved by MOL, including, burger’s type equation [29], generalized Kuramoto-Sivashinsky equation [30], KdV equation [31]. KdVequation for small time [32]. For stability and convergence of MOL see [33-35] .


31

FORMULATION OF METHOD OF LINES

In this section we will find the numerical solution of Equation (2) by applying MOL.

By using RBF, the problem domain will be discretized in space variable which will convert the given PDE into system of ODEs, which will be easily solved by using any appropriate ODE solver. In this work we will use RK4 method.

RBF collocation

This section is concerned with MOL-RBF interpolation. By using RBF we will

interpolate the approximate solution to the problem. Let u~ be the RBF approximation to u , which denote solution of the given PDE. Let

us divide the problem domain in n nodes, nxxx ,,, 21 , in where represents the interior and is the boundary of the domain. Also nxx ,1 , while 132 ,,, nxxx . The RBF approximation of )(xu is given by,

n

jnnji kkkkxu

12211 ,)(~

k)()(~ xxu T (4)

Where Tn xxxx )(,),(),()( 21 and .,,, 21T

nkkk k Here j denotes the RBFs and jk are the unknown constants.

Let we denote approximate solution is at thj node by ju i.e. jj uxu )(~ Then from Equation (4),

,)(,

,)(,)(

22

11

k

kk

nT

n

T

T

xu

xuxu

Which can be written in the matrix form as, ,uΒ k (5)

Where ,

)()()(

)()()()()()(

21

22221

11211

nnnn

n

n

xxx

xxxxxx

Β

nk

kk

2

1

k and .2

1

nu

uu

u


32

In above equation matrixΒ is called the interpolation (or collocation) matrix and it consists of RBFs at nodes. From Equation (5), we have,

uΒ 1k Putting this value in Equation (4), we get

,)()(~ 1uΒ xxu T ,)()(~ uΡ xxu (6)

Here ,)(,),(),()( 21T

n xxxx nuuu ,., 21 u and ,)()( 1 ΒP xx T The singularity of collocation matrix Β depends on the choice of RBF. For this

purpose; choose the value of shape parameter c as large as possible to get large difference between at least two columns of Β . However this produces less accuracy to contrast with that of small values of c . But small value of c causes ill-conditioning of collocation matrix. So the shape parameter has great effect on condition number [36]. The determination of best value of c is still problem. There are many methods to find the best value of c . The most easy and popular is the brute force method in which Max-error is plotted for different values of c . The value of c on which the least Max-error is appeared is considered as best value of shape parameter. This technique is applied to find the optimum value of c by the authors in [30] and [31]. Different methods of finding optimal value of c are given in [37-41].

Application of MOL-RBF to Fisher’s Equation Using RBF

In this section, we will approximate the unknown solution (ݔ)ݑ as a linear combination of n RBFs to find the numerical solution of Fisher’s equation by applying MOL. We consider Fisher’s equation,

0)1( uuuu xxt (7) Where is arbitrary constant, the initial condition is taken as,

)()0,( xxu ,bxa (8) and the boundary conditions are,

),(),( ttau ).(),( ttbu (9) Using Equation (6) to Equation (7), we get the following collocated form of Equation (7)

,0)1()( jjjxxj uuux

dtdu

,,,2,1 nj (10)

Where jj utu )( , and ,)()()()( 21 jnxxjxxjxxjxx xxxx

With ).()( 2

2

jkjkxx xPx

xP

To write the system of (10) as a column vector, let ,21

TnuuuU

.)(nnjkxxxx xPP


33

Then Equation (10) can be written as,

,0)1( UUUPdt

dUxx (11)

We can write Equation. (11) in compact form as,

),(UFdt

dU (12)

Where ).1()( UUUPUF xx (13)

The initial condition can also be written as, .)()()()( 0

20

10

00T

nxuxuxuUtU (14) From the boundary conditions,

),()(1 ttu ).()( ttun (15) Now we will solve resulting system of ODEs (12) with initial conditions (14) by using RK4 method. The RK4 algorithm for Equation (12) is

).22(6 4321

1 KKKKtUU mm (16)

Here ),(1mUFK

),2

( 12 KtUFK m

),2

( 23 KtUFK m

).( 34 tKUFK m Here mU is the approximate solution at thm time level, t is the time step.

RESULTS

In this section we will give numerical example to show the accuracy of our method. We will compare our method with DQM. 1.1 Test Problem 1 We consider the Fisher's equation,

0)1( uuuu xxt (17) Where is an arbitrary constant. The exact solution of Equation (17) is given by,

2

6.5)

6(

1

1),(

txe

txu

, (18)

The initial condition and boundary conditions are,


34

,

1

1)0,( 2

6

x

e

xu

,

1

1),( 2

6.5

6

ta

e

tau

.

1

1),( 2

6.5

6

tb

e

tbu

Numerical calculation is performed over the interval ]1,0[]1,0[ with step size 1.0h and time step 001.0t , with .6 Different types of RBFs are used such as MQ,

IMQ, GA and IQ. Table 2 is concerned with the error norms to assess the behavior of MOL, which are defined as follows:

,max uuL

,1

22

n

iuuhL

.1

2

n

uuL

n

irms

Where u and u are exact and numerical solutions respectively. From Table 1 it can

be easily observed that MQ, IMQ, and GA have more accurate results than IQ. We have taken the values of 2.2,4.1,1c and 34.0 for MQ, IMQ, GA and IQ respectively. In Table 2, there is comparison of MOL and DQM and it shows that MQ, IMQ and GA results are more accurate than that of DQM whereas IQ has less accuracy than DQM.

In this work, we have applied brute-force technique to find the best value of c . From

Figure 1, we can see that the least Max-error for MQ is at 1.1c , Figure 2 shows that least Max-error for IMQ is at 4.1c . Similarly in Figure 3 one can see that least Max-error for GA occurs at 2.2c and the Figure 4 represents that Max-error for IQ attains its minimum value at 34.0c .

Now we highlight the Figures of comparison of numerical and exact solution. In

Figure 5, graph of numerical solution falls nearly to that of exact solution. And same as the

(19)

(20)

(21)

(22)

(23)

(24)


35

result for the graph of IMQ and GA, In Figure 6 and 7 respectively, but the graph of IQ in Figure 8 is slightly different from others.

Table 1: 2, LL and rmsL for different RBFs

t RBF ∞ 0.1 MQ 2.6452E-07 1.3234E-07 1.3234E-07

IMQ 3.5363E-07 2.1788E-07 2.1788E-07 GA 7.5121E-07 2.8072E-07 2.8072E-07 IQ 5.5756E-03 2.9324E-03 2.9324E-03

0.2

MQ 4.7441E-07 2.9961E-07 2.9961E-07 IMQ 4.0007E-07 2.5327E-07 2.5327E-07 GA 8.7103E-07 3.3578E-07 3.3578E-07 IQ 2.2767E-02 1.1850E-02 1.1850E-02

0.3 MQ 7.3297E-07 4.9349E-07 4.9349E-07 IMQ 4.6189E-07 2.4497E-07 2.4497E-07 GA 9.3713E-07 3.6373E-07 3.6373E-07 IQ 5.3162E-02 3.4769E-02 3.4769E-02



Figure 1: ∞ error norm for different values of using MQ


36

Figure 2: ∞ error norm for different values of using IMQ

Figure 3: ∞ error norm for different values of using GA

Figure 4: ∞ error norm for different values of using IQ


37

Figure 5: Comparison of numerical and exact solution using MQ

Figure 6: Comparison of numerical and exact solution using IMQ

Figure 7: Comparison of numerical and exact solution using GA


38

Figure 8: Comparison of numerical and exact solution using IQ

Table 2: Comparison of MOL RBF and DQM

x t MQ c=0.8

IMQ c=01

GA c=4.5

IQ c=0.01

Exact Solution

DQM [42]

0.25 0.5 0.818389 0.818393 0.818399 0.826779 0.818393 0.81843 1.0 0.982915 0.982919 0.982926 0.999993 0.982919 0.98292 2.0 0.999881 0.999883 0.999889 1.000312 0.999883 0.99988 5.0 0.999999 0.999999 1.000004 1.000338 1.000000 1.00000

0.50 0.5 0.775800 0.775803 0.775811 0.769864 0.775803 0.77585 1.0 0.978144 0.978147 0.978155 0.985615 0.978147 0.97815 2.0 0.999849 0.999850 0.999857 1.000253 0.999850 0.99985 5.0 0.999999 0.999999 1.000002 1.000290 1.000000 1.00000

0.75 0.5 0.725819 0.725823 0.725832 0.697997 0.725824 0.72588 1.0 0.972068 0.972071 0.972080 0.979299 0.972071 0.92208 2.0 0.999806 0.999808 0.999815 1.000333 0.999808 0.99981 5.0 1.000000 0.999999 1.000000 1.000386 1.000000 1.00000

CONCLUSION

In this paper, we implemented MOL over Fisher’s equation. Different types of RBFs are used to approximate the solution of the governing equation. Results show the impressive behavior of our method. We have compared our method with DQM and obtained more accurate results than DQM.


39

REFERENCES [1] R. A. Fisher, "The wave of advance of advantageous genes," Ann. Eugen., vol. 7, no.

4, pp. 355-369, 1937. Available: http://jxshix.people.wm.edu/2009-harbin-course/classic/Fisher-1937.pdf

[2] A.-M. Wazwaz and A. Gorguis, "An analytic study of Fisher's equation by using Adomian decomposition method," Math. Comput., vol. 154, no. 3, pp. 609-620, 2004. doi: 10.1016/S0096-3003(03)00738-0

[3] N. F. Britton, Reaction-diffusion equations and their applications to biology. Academic Press, 1986.

[4] L. Debnath, Nonlinear partial differential equations for scientists and engineers: Birkhäuser Basel, 2011. Available. doi: 10.1007/978-0-8176-8265-1.

[5] J. D. Murray, Mathematical Biology, New York: Springer, 1996, p. 553. Available. doi: 10.1007/b98868.

[6] M. G. Neubert and I. M. Parker, "Projecting rates of spread for invasive species," Risk Anal., vol. 24, no. 4, pp. 817-831, 2004. doi: 10.1111/j.0272-4332.2004.00481.x

[7] V. M. Kenkre, "Results from variants of the Fisher equation in the study of epidemics and bacteria," Phys. A Stat. Mech. Appl., vol. 342, no. 1, pp. 242-248, 2004. doi: 10.1016/j.physa.2004.04.084

[8] T. Sardar et al., "A mathematical model of dengue transmission with memory," Commun. Nonlinear Sci. Numer. Simul., vol. 22, no. 1–3, pp. 511-525, 2015. doi: 10.1016/j.cnsns.2014.08.009

[9] Z. Feng, "Traveling wave behavior for a generalized Fisher equation," Chaos, Solitons & Fractals, vol. 38, no. 2, pp. 481-488, 2008. doi: 10.1016/j.chaos.2006.11.031

[10] B.-Y. Guo and Z.-X. Chen, "Analytic solutions of the Fisher equation," J. Phys. A. Math. Gen., vol. 24, no. 3, p. 645, 1991.

[11] N. A. Kudryashov, "Exact solitary waves of the Fisher equation," Physics Letters A, vol. 342, no. 1, pp. 99-106, 2005. doi: 10.1016/j.physleta.2005.05.025

[12] A.-M. Wazwaz, "The tanh method for traveling wave solutions of nonlinear equations," Appl. Math. Comput., vol. 154, no. 3, pp. 713-723, 2004. doi: 10.1016/S0096-3003(03)00745-8

[13] T. Kawahara and M. Tanaka, "Interactions of traveling fronts: An exact solution of a nonlinear diffusion equation," Physics Letters A, vol. 97, no. 8, pp. 311-314, 1983. doi: 10.1016/0375-9601(83)90648-5


40

[14] P. K. Brazhnik and J. J. Tyson, "On Traveling Wave Solutions of Fisher's Equation in Two Spatial Dimensions," SIAM J. Appl. Math., vol. 60, no. 2, pp. 371-391, 1999. doi: 10.1137/S0036139997325497

[15] D. A. Larson, "Transient bounds and time-asymptotic behavior of solutions to nonlinear equations of Fisher type," SIAM J. Appl. Math., vol. 34, no. 1, pp. 93-104, 1978. doi: 10.1137/0134008

[16] X. Y. Wang, "Exact and explicit solitary wave solutions for the generalised Fisher equation," Physics Letters A, vol. 131, no. 4-5, pp. 277-279, 1988. doi: 10.1016/0375-9601(88)90027-8

[17] J. Gazdag and J. Canosa, "Numerical solution of Fisher's equation," J. Appl. Probab., vol. 11, no. 03, pp. 445-457, 1974. doi: 10.2307/3212689

[18] E. H. Twizell et al., "Chaos-free numerical solutions of reaction-diffusion equations," Proc. R. Soc. A Math. Phys. Eng. Sci., vol. 430, no. 1880, pp. 541-576, 1990. doi: 10.1098/rspa.1990.0106

[19] N. Parekh and S. Puri, "A new numerical scheme for the Fisher equation," J. Phys. A. Math. Gen., vol. 23, no. 21, p. L1085, 1990. Available: http://stacks.iop.org/0305-4470/23/i=21/a=00

[20] G. F. Carey and Y. Shen, "Least-squares finite element approximation of Fisher's reaction--diffusion equation," Numer. Methods Partial Differ. Equ., vol. 11, no. 2, pp. 175-186, 1995. doi: 10.1002/num.1690110206

[21] R. Uddin, "Comparison of the nodal integral method and nonstandard finite-difference schemes for the Fisher equation," SIAM J. Sci. Comput., vol. 22, no. 6, pp. 1926-1942, 2001. doi: 10.1137/S1064827597325463

[22] S. Tang and R. O. Weber, "Numerical study of Fisher's equation by a Petrov-Galerkin finite element method," J. Aust. Math. Soc. Ser. B. Appl. Math., vol. 33, no. 01, pp. 27-38, 1991. doi: 10.1016/0895-7177(94)90118-X

[23] T. Hagstrom and H. B. Keller, "The Numerical calculation of traveling wave solutions of nonlinear parabolic equations," SIAM J. Sci. Stat. Comput., vol. 7, no. 3, pp. 978-988, 1986. doi: 10.1137/0907065

[24] R. E. Mickens, "A best finite-difference scheme for the fisher equation," Numer. Methods Partial Differ. Equ., vol. 10, no. 5, pp. 581-585, 1994. doi: 10.1002/num.1690100505

[25] D. Olmos and B. D. Shizgal, "A pseudospectral method of solution of Fisher's equation," J. Comput. Appl. Math., vol. 193, no. 1, pp. 219-242, 2006. doi: 10.1016/j.cam.2005.06.028


41

[26] Y. Qiu and D. M. Sloan, "Numerical solution of Fisher's equation using a moving mesh method," J. Comput. Phys., vol. 146, no. 2, pp. 726-746, 1998. doi: 10.1006/jcph.1998.6081

[27] R. C. Mittal and S. Kumar, "Numerical study of Fisher's equation by wavelet Galerkin method," Int. J. Comput. Math., vol. 83, no. 3, pp. 287-298, 2006. doi: 10.1080/00207160600717758

[28] E. Rothe, "Zweidimensionale parabolische randwertaufgaben als grenzfall eindimensionaler randwertaufgaben," Math. Ann., vol. 102, no. 1, pp. 650-670, 1930. Available: https://eudml.org/doc/159400

[29] A. Ali et al., "A numerical meshless technique for the solution of some burgers' type equations," World Appl. Sci. J., vol. 14, no. 12, pp. 1792-1798, 2011.

[30] S. Haq et al., "Meshless method of lines for the numerical solution of generalized Kuramoto-Sivashinsky equation," Appl. Math. Comput., vol. 217, no. 6, pp. 2404-2413, 2010. Available: https://www.idosi.org/wasj/wasj14(12)11/4.pdf

[31] Q. Shen, "A meshless method of lines for the numerical solution of KdV equation using radial basis functions," Eng. Anal. Bound. Elem., vol. 33, no. 10, pp. 1171-1180, 2009. doi: 10.1016/j.enganabound.2009.04.008

[32] A. Özdeş and E. N. Aksan, "The method of lines solution of the Korteweg-de Vries equation for small times," Int. J. Contemp. Math. Sci., vol. 1, pp. 639-650, 2006.

[33] W. Zong-Min, "Radial Basis Function Scattered Data Interpolation and the Meshless Method of Numerical Solution of PDEs J," Chinese J. Eng. Math., vol. 19, no. 2, pp. 1-12, 2002.

[34] Z.-M. Wu and R. Schaback, "Local error estimates for radial basis function interpolation of scattered data," IMA J. Numer. Anal., vol. 13, no. 1, pp. 13-27, 1993. doi: 10.1.1.45.4136

[35] S. C. Reddy and L. N. Trefethen, "Stability of the method of lines," Numer. Math., vol. 62, no. 1, pp. 235-267, 1992. doi: 10.1.1.210.2612

[36] G.-R. Liu and Y. Gu, "A point interpolation method for two-dimensional solids," Int. J. Numer. Methods Eng., vol. 50, no. 4, pp. 937-951, 2001. doi: 10.1002/1097-0207(20010210)50:4<937::AID-NME62>3.0.CO;2-X

[37] M. R. Dubal, "Construction of three-dimensional black-hole initial data via multiquadrics," Phys. Rev. D, Part. fields, vol. 45, no. 4, p. 1178, 1992. doi: 10.1103/PhysRevD.45.1178

[38] S. Rippa, "An algorithm for selecting a good value for the parameter c in radial basis function interpolation," Adv. Comput. Math., vol. 11, no. 2, pp. 193-210, 1999. doi: 10.1023/A:1018975909870


42

[39] C. S. Huang et al., "Error estimate, optimal shape factor, and high precision computation of multiquadric collocation method," Eng. Anal. Bound. Elem., vol. 31, no. 7, pp. 614-623, 2007. doi: 10.1016/j.enganabound.2006.11.011

[40] J. G. Wang and G. R. Liu, "On the optimal shape parameters of radial basis functions used for 2-D meshless methods," Comput. Methods Appl. Mech. Eng., vol. 191, no. 23, pp. 2611-2630, 2002. doi: 10.1016/S0045-7825(01)00419-4

[41] J. Yoon, "Spectral approximation orders of radial basis function interpolation on the Sobolev space," SIAM J. Math. Anal., vol. 33, no. 4, pp. 946-958, 2001. doi: 10.1137/S0036141000373811

[42] R. C. C. Mittal and R. A. M. Jiwari, "Numerical study of Fisher's equation by using differential quadrature method," Int. J. Inf. Syst. Sci., vol. 5, no. 1, pp. 143-160, 2009. Available: https://www.researchgate.net/publication/259639568_Numerical_study_of_Fisher's_equation_by_using_differential_quadrature_method

PJCIS (2016) Vol. 1 No. 2 : 43-52 Numerical Approximation of

43

Numerical Approximation of Rapidly Oscillatory Bessel Integral Transforms

SAKHI ZAMAN*, SIRAJ-UL-ISLAM

Department of Basic Sciences & Islamiat, University of Engineering & Technology, Peshawar, Pakistan *Corresponding author’s email: [email protected]

Abstract

We present a new procedure of Levin type which is based on Gaussian radial basis function for evaluation of rapidly oscillating integrals that contains first kind of the Bessel function 퐽 (휔푥). Multi-resolution quadrature rules like hybrid and Haar functions are used in the context of Bessel oscillatory integrals as well. Numerical test problems are solved to verify the accuracy and efficiency of the new methods.

Keywords: Rapidly oscillatory integrand, Bessel Function of the first kind,

Gaussian RBF, Hybrid and Haar functions.

INTRODUCTION

Bessel oscillatory integrals have applications in many areas of science and engineering such as astronomy, optics, electromagnetic, seismology, image processing etc.[1, 2]. In the present paper, we have evaluated the Bessel type of oscillatory integrals of the form:

퐼[푔,휅] = 푔(푥) 퐽 (휔푥)푑푥, (1)

Where g (x) is non-oscillatory smooth function and 퐽 (휔푥)is the first kind of Bessel function with 푣>0, order of the Bessel function.

Many accurate methods have been developed for numerical evaluation of the integrals of the form (1) such as Levin collocation method [3-5], generalized quadrature rule [6-9], Homotopy perturbation method [1, 10] and some more. Levin [3] proposed a new approach for numerical evaluation of Bessel type of oscillatory integrals of the form (1). In the same paper, he extended the method to the solution of integrals with Bessel-trigonometric and square of the Bessel oscillatory integrands. In the next paper [4], Levin calculated some theoretical error bounds for the method given in [3].

Xiang [5] investigated some new theoretical error bounds for the method reported in [3] with asymptotic order of convergence 푂 휅 .


44

In this paper, we have used collocation with Gaussian RBF as basis function (GRBF) instead of monomials [3, 5]. The asymptotic order of convergence of the proposed method GRBF is 푂 휅 . Multi-resolution quadrature rules like hybrid and Haar functions [11] are used for evaluation of the integrals (1) as well.

GAUSSIAN RBF BASED QUADRATURE According to this procedure, a new technique is proposed to evaluate a class of

oscillatory integrals of the form:

푔(푥) 훶(휅,푥)푑푥 = 퐺(푥).휰(휅,푥)푑푥,

0 ≤ 푐 ≤ 푥 ≤ 푑, (2) Where G(x) and 휰(휅, 푥)are vectors of the non-oscillatory and the oscillatory functions

respectively. The derivative of 휰(휅, 푥) is휰 (휅,푥) = 퐵(휅,푥)휰(휅,푥) , where 퐵(휅, 푥) is 푛 ×푛 matrix of non-oscillatory functions.

An approximate solution 푆(푥) = ∑ 푤[ ] 휑 (푥), 푖 = 1,2, … , 푛 is supposed to satisfy

the following ODE:

£[푆(푥)] = 퐺(푥), 0 ≤ 푐 ≤ 푥 ≤ 푑, (3) Where

£[푆(푥)] = 푆 (푥) + 퐵(휅,푥) 푆(푥). Then the unknown coefficients 푤[ ], 푖 = 1,2, … ,푚, 푗 = 1,2, … , 푛 can be determined

by the interpolation condition:

£ 푆(푥 ) = 퐺(푥 ), 푘 = 1,2, … ,푚. (4) Thus integral (1) can be evaluated as:

퐺푅퐵퐹 = [푆 (푥) + 퐵(휅, 푥)푆(푥)].휰(휅,푥) 푑푥

= 푆(푥).휰(휅, 푥) 푑푥

=푆(푑).휰(휅, 푑) − 푆( 푐).휰(휅, 푐).

Particularly, to compute the integral 퐼[푓, 휅] = ∫ 푔(푥) 퐽 (휔푥)푑푥, we assume


45

퐵(휅, 푥) = −휅

휅 , 휰(휅, 푥) = 퐽 (푥)

퐽 (푥) and 푮(푥) = 0푔(푥) .

In this case, the approximate solution 푆(푥) = ∑ 푤[ ] 휑 (푥), 푖 = 1,2 is supposed to

satisfy the ODE (4) and then we can find the values of the unknown coefficients 푤[ ], 푗 =1,2, 푖 = 1,2, … ,푚.

On substituting the values of 퐵(휅,푥), 푆(푥) and 푮(푥) in (4), we obtain a system of coupled equations;

[휑 (푥) + 푣 − 1푥 휑(푥)] 푤[ ] + 휅휑(푥) 푤[ ] = 0

−휅휑(푥)푤[ ] + [휑 (푥)−푣푥 휑(푥)] 푤[ ] = 푔(푥). (5)

The (5) can then be written in matrix form as:

푨휷 = 푮, Where

휷 = 푤[ ]

푤[ ] , 푮(푥) = 0푔(푥) ,

And A is a 2푚 × 2푚 square matrix. 휷 and 푮(푥) are the column matrices of order 2푚 × 1. An accurate approximate solution of the ODE (3) is the aim of this paper. For this purpose, Gaussian RBF휑(푟, 푐)is used as basis function and is defined by;

휑(푟, 푐) = 푒 , (6) And

휑′(푟, 푐) =−2푟푐 푒 ,

Where c is the shape parameter of the Gaussian RBF and 푟 = (푥 − 푥푐) , xc,s are

the centers of Gaussian RBF. The accuracy as well as the condition number of the system (5) depends upon the value of c. Therefore, an optimal value of the shape parameter is still an open problem. In this paper, an algorithm [12] is used for an optimal value of c. In this algorithm, the value of c is changing with change in the nodal points as well as the frequency parameter 휅. In this paper, we have used 푐 = 0 and 푐 = 3 for finding c in the algorithm.

Solving the system of equation (5) for the unknown coefficient matrix 휷 and find the

approximate solution 푆(푥).


46

QUADRATURE BASED ON HYBRID AND HAAR FUNCTIONS

In this section, multi-resolution quadrature rules like hybrid functions (HFQ) and

Haar wavelets (HWQ) are briefly described. The detail description and proofs of the formulae for HFQ and HWQ are given in [11, 13] .

In this paper we have used hybrid function based quadrature of order m = 8 (HF Q8)

for evaluating the integral of the form: 퐼 [푓 ] = ∫ 푓(푥)푑푥. Formula for the HFQ8 is given by

HFQ8 = ∑ [295627 푓 푎 + (16푘 − 15) + 71329 푓 푎 + (16푘 − 13) +

471771 푓 푎 + ℎ2

(16푘 − 11) + 128953 푓 푎 + ℎ2

(16푘 − 9) +

128953 푓 푎 + ℎ2

(16푘 − 7) + 471771 푓 푎 + ℎ2

(16푘 − 5) +

71329 푓 푎 + ℎ2

(16푘 − 3) + 295627 푓 푎 +ℎ2

(16푘 − 1) , (7) Where ℎ = . Similarly, the formula of Haar wavelet based quadrature for computing the integral 퐼 [푓 ] =∫ 푓(푥)푑푥 is given by HWQ = ℎ ∑ 푓(푥 )

= ℎ 푓(푎 + ℎ(푘 − 0.5)), (8)

Where ℎ = and 푁 = 2푀. Note: In case of Bessel type of oscillatory integrals, we take 푓(푥) = 푔(푥) 퐽 (휔푥).

CONVERGENCE

Some theoretical results for convergence of the proposed methods are calculated. First, we consider the error bounds of the multi resolution methods HFQ and HWQ:


47

Quadrature based on hybrid and Haar functions

If a = 0, b = 1, n = 4 and ℎ = , then the error bound of formula (7) is calculated as:

|퐸푟푟표푟| = ℎ

4.54푓( )(휉), (9)

Where 휉 ∈ [푎, 푏].

Similarly, for the integral 퐼 [푓 ] = ∫ 푓(푥)푑푥 and 2M = 4, then the error bound for the HWQ is defined as:

|퐸푟푟표푟| = ℎ6 푓′′(휂), (10)

where 휂 ∈ [푎,푏].

Gaussian RBF based quadrature

Theorem 1. Let 퐵(휅, 푥) = ( 퐴(휅, 푥)) exists and 퐺(푥),퐴(휅,푥), 휰(휅, 푥) ∈ 퐶 [푎,푏]. Also 퐵 (휅,푥), 퐵 (휅,푥) and their 2m derivatives are uniformly bounded on [a, b], then the error bound for computing the integral (1) by the Gaussian RBF based quadrature rule is given by

퐸(휅) = |퐼(푔) − 퐺푅퐵퐹| = O ( / ).

NUMERICAL RESULTS

In this section, some test cases are considered to verify the accuracy and efficiency of the proposed methods. The real solution of the test problem is obtained from MAPLE 15 [14]. Results in terms of absolute errors (Error) are obtained.

Test problem 1. Consider the computation of the integral [1]

퐼 [푓, 휅] = 1

1 + 푥 퐽 (휔푥)푑푥,

by the Gaussian RBF based quadrature and multi-resolution quadrature rules like HFQ and HWQ. Numerical results related to the frequency parameter obtained from the three methods are shown in Figure 1. The proposed method GRBF improves the accuracy as the frequency 휅 is increasing, while the multi-resolution methods HFQ and HWQ fail to retain the desired accuracy as shown in Figure 1. The multi resolution methods give the desired accuracy at finer nodes which becomes computationally extensive as shown in Figure 2. From both the figures, it is clear that the new method, GRBF retains the desired accuracy for high frequencies at coarser grid points. It is clear from the Figure 3 that the new method GRBF is accurate with asymptotic order of convergence 푂 휅 at small nodes i.e. m = 10. The oscillatory behavior of the integrand is shown in Figure 4 for 휅 = 1000. In last, the new method is tested for high frequency value.


48

According to the results in Table 1, it is evident that the method GRBF is accurate and efficient at small number of nodal points.

Table 1: Absolute error and CPU time (in parenthesis) produced by the GRBF

휿 m = 10 m = 20 m = 30

10 1:2418푒 (0.0209s)

1:5043푒 (0:0981s)

8:2876푒 (0:1539s)

10 3:5427 푒 (0:0316s)

3:2348푒 (0:0917s)

4:2022푒 (0:2088s)

10 1:1548푒 (0:0267s)

3:9864푒 (0:1022s)

2:8363푒 (0:1742s)

10 3:5506푒 (0:0404s)

7:3052푒 (0:0969s)

2:8113푒 (0:1289s)

. Figure 1: Absolute error of HFQ, HWQ and GRBF for m=10


49

. Figure 2: Absolute error of HFQ, HWQ and GRBF for 휿 = ퟏퟎퟎퟎ

Figure 3: Absolute error scaled by 휿ퟕ/ퟐ of GRBF for m = 10


50

. Figure 4: Oscillatory behaviors of the integrand for 휿 = ퟏퟎퟎퟎ

CONCLUSION

In this paper, a collocation method based on Gaussian RBF and multi-resolution

quadrature rules like HFQ and HWQ are used for numerical evaluation of Bessel type of oscillatory integrals. Some theoretical error bounds of the new methods are found. Test problem shows improved results of the new methods.

Nomenclature Box:

Symbols Description

RBF Radial basis functions 푆(x) Approximate value of S 푣 Order of the first kind of Bessel function c Shape parameter of the RBF interpolation

푐 , 푐 Lower and upper bounds for optimal value of the shape parameter 훶(휅,푥) A vector of Bessel oscillatory functions O(휅) Asymptotic order of convergence in terms of Frequency parameter 푤 , 푠 Unknown coefficients

GRBF Gaussian RBF based quadrature HFQ Hybrid functions based quadrature HWQ Haar wavelets based quadrature


51

REFERENCES

[1] R. Chen, "Numerical Approximations to Integrals with a Highly Oscillatory Bessel Kernel," Appl. Numer. Math., vol. 62, no. 5, pp. 636-648, 2012. doi: 10.1016/ j.apnum.2012.01.009

[2] Z. Xu and S. Xiang, "On the evaluation of highly oscillatory finite Hankel transform using special functions," Numer. Algorithms, vol. 72, no. 1, pp. 37-56, 2016. doi: 10.1007/s11075-015-0033-3

[3] D. Levin, "Fast integration of rapidly oscillatory functions," J. Comput. Appl. Math., vol. 67, no. 1, pp. 95-101, 1996. doi: 10.1016/0377-0427(94)00118-9

[4] D. Levin, "Analysis of a collocation method for integrating rapidly oscillatory functions," J. Comput. Appl. Math., vol. 78, no. 1, pp. 131-138, 1997. doi: 10.1016/ S0377-0427(96)00137-9

[5] S. Xiang, "Numerical analysis of a fast integration method for highly oscillatory functions," BIT Numer. Math., vol. 47, no. 2, pp. 469-482, 2007. doi: 10.1007/ s10543-007-0127-y

[6] K. C. Chung et al., "A method to generate generalized quadrature rules foroscillatory integrals," Appl. Numer. Math., vol. 34, no. 1, pp. 85-93, 2000. doi: 10.1016/S0168-9274(99)00033-1

[7] G. A. Evans and K. C. Chung, "Some theoretical aspects of generalised quadrature methods," J. Complex., vol. 19, no. 3, pp. 272-285, 2003. doi: 10.1016/S0885-064X(03)00004-9

[8] G. A. Evans and J. R. Webster, "A high order, progressive method for the evaluation of irregular oscillatory integrals," Appl. Numer. Math., vol. 23, no. 2, pp. 205-218, 1997. doi: 10.1016/S0168-9274(96)00058-X

[9] S. Xiang and W. Gui, "On generalized quadrature rules for fast oscillatory integrals," Appl. Math. Comput., vol. 197, no. 1, pp. 60-75, 2008. doi: 10.1016/ j.amc.2007.07.052

[10] R. Chen and X. Liang, "Asymptotic expansions of Bessel, Anger and Weber transformations," J. Math. Anal. Appl., vol. 372, no. 2, pp. 377-389, 2010. doi: 10.1016/j.jmaa.2010.07.012

[11] I. Aziz et al., "Quadrature rules for numerical integration based on Haar wavelets and hybrid functions," Comput. Math. with Appl., vol. 61, no. 9, pp. 2770-2781, 2011. doi: 10.1016/j.camwa.2011.03.043

[12] Siraj-ul-Islam and S. Zaman, "New quadrature rules for highly oscillatory integrals with stationary points," J. Comput. Appl. Math., vol. 278, pp. 75-89, 2015. doi: 10.1016/j.cam.2014.09.019


52

[13] I. Siraj ul et al., "A comparative study of numerical integration based on Haar wavelets and hybrid functions," Comput. Math. with Appl., vol. 59, no. 6, pp. 2026-2036, 2010. doi: 10.1016/j.camwa.2009.12.005

[14] B. W. Char et al., Maple V Language Reference Manual: Springer US, 1991. Available: http://link.springer.com/10.1007/978-1-4615-7386-9.

PJCIS (2016), Vol. 1, No. 2: 53-65 Advance Persistent Threat

53

Advance Persistent Threat Defense Techniques: A Review

MURTAZA AHMED SIDDIQI*, AZIZ MUGHERI AND KANWAL OAD

Department of Computer Science SZABIST Larkana Corresponding author’s email: [email protected]

Abstract

The evolution of internet in the age of information is very rapid. With the rapid development of the internet, significance of privacy and security is also becoming a key concern. This growing security concern is not only limited to multinational organizations and government’s high value data, but also for the mass users. During the last few years, there have been a number of network breaches with aims of espionage or sabotage, using an advanced and lethal methodology known as Advanced Persistent Threat. Keeping in view the damage done by such attacks, this paper based on literature review is intended to provide readers with intensive knowledge of an APT attack with its common phases. Later sections of the paper highlights the existing security methods currently in use or proposed by different researchers and security organizations to counter APT attacks. Statistical data on known APT attacks conducted over the last few years is also included in the paper to give the readers a clear idea of devastation caused by APT attacks. At the end of the paper conclusion and future work is emphasized, which include the crucial steps that can be employed to fight against APT attacks. Data analysed in this paper is extracted from annual reports published by well-known security implementation groups and reports released by organizations that have been targeted or victim of APT attacks.

Keywords: Advance Persistent Threats (APT), Security, Internet, Hacking,

Malware.

INTRODUCTION

The war among the hackers and security experts has been going on since the birth of

internet. With the passage of time internet is becoming a necessity of everyone’s life for connecting everyone everywhere regardless of the geographical boundaries. But for internet user’s, privacy and security has becomes a great concern. With the rapid escalation in sophisticated hacking techniques, even a minor flaw in security can result in a great tragedy. As the industries are moving towards an IT revolution and IT dependency for day-to-day routine is increasing, therefore the security concerns associated with IT are on the rise. Cyber espionage, security breaches and privacy issues are escalating in frequency and are becoming more complex, persistent and difficult to intercept [1]. As per studies such cyber security breaches throughout the globe are costing an estimated cost of nearly $400 billion every year [2].


54

Keeping in view the recent incidents and financial loss caused due to APT and similar attacks, a number of researchers, government institutions and corporate organizations are questioning about the current security techniques implemented at sensitive data [3]. Studies indicate that cyber attacks conducted at high value targets are highly covert, tenacious and challenging to detect; such attacks are usually initiated with a very stealth approach, to avoid raising any suspicious similar to the Stuxnet [4]and Aurora case [5].

ADVANCED PERSISTENT THREATS

Attack patterns similar to Stuxnet and Aurora are categorized as Advanced Persistent Threats (APT) [6]. Table 1 contains list of some of the well-known APT attacks. An APT attack cannot be preserved casually on any account. Typically, an APT attack can result in costing an organizations nearly $5.5 million in order to counter and investigate the damage done by an APT attack [7]. APT attacks can be separated into number of stages and each stage carries out a specific task, making APT attacks difficult to detect and counter [8]. One of the most common misunderstandings among most of the users is to think that traditional security tools and defence methods are enough to encounter an APT attack. Such attacks are quite persistent, strategically conducted, well funded, sophisticated and are carried out with a stealthy approach. As far as persistency, stealth and thoughtful planning is concerned, researcher’s and security experts have come by APT attacks that extended from a month to 28 months in order to maintain stealthy approach to accomplish the desired task [9]. To keep a low profile and avoid suspicion an APT can even change its operational modes from observing or active to sleep mood. APT attackers can also adapt and modify the approach of attack, based on security barriers deployed with in the network [10]. Table 1 highlights the efficiency of APT attacks; showing some of the reported incidents of successful security breaches in some of the most protected networks belonging to both private and government sectors.

Table 1: Known APT attacks from the last few years (2007-2015) [11-14].

Attack Entry Method Date Classification Aurora

Operation Malware 2007 Espionage

Stuxnet Malware 2009 Sabotage Energetic Malware 2011 Espionage

RAS Breach 0day, Malware 2011 Espionage DigiNotar Compromised network access 2011 Sabotage Luckycat Spear phishing emails, 0day, malware 2011 Espionage

Flame Malware 2012 Espionage Shamoon Malware 2012 Sabotage Operation Ke3chang

Malware 2010 2013

Espionage

Operation SnowMan

Water hole attack (weakness in vfw.org)

2014 Unknown (suspected to be Espionage)

Heartbleed Malware 2014 Espionage Darkhotel Malware, spear phishing, 0 day 2014

2015 Espionage


55

APT Phases

In previous section, it is mentioned that an APT attacks consist of number of phases [8]. These phases can be generally classified in to following:

i. Reconnaissance: To observe and gain as much information as possible about the

target network.

ii. Infiltration: Based on the information collected in the first step the assailant attempts to discover a weak link in order to launch the attack in order to gain access to the network.

iii. Discovery: After the target network is successfully breached, network discovering protocols are run by the assailant. To discover target data, to understand security implemented on the network and how the desired task can be achieved without any suspicion.

iv. Capture: Once the desired target and how to reach that target is identified, the assailant initiates the attack phase and tries to accomplish the desired task with maximum stealth to avoid any suspicion and risk of getting caught.

v. Escape: After completing the objective and acquiring the desired data, assailant makes an escape from the network. To make things worse an APT attacker tries not to leave any kind of digital prints on the victim network, in order to make things difficult for the investigation team to recognize the intrusion procedure, objective achieved or harm done and exit method.

EXISTING TOOLS AND METHODS TO ENCOUNTER APT

ATTACKS

APT attacks are becoming more sophisticated and stronger with the passage of time. In order to counter APT attacks much more enhanced and stronger defense techniques are required. A number of common techniques such as; Firewall, IDS, IPS, Botnet, Sandboxing, Web & Email protection, Web application firewall and anti viruses are currently among the best available defense against an APT attack. Unfortunately, with the rapid enhancement in APT attacks, the mentioned tools are not too effective and advance techniques are needed, such as layer based defense mechanism [15].

Defense in Depth/Multi-Layer Defense System

Defense in depth is a multi-layer defense method which is originally based on a concept from military discipline. The basic idea is to apply protective system at multiple layers of network making it much more secure as compared to a single layer security


56

mechanism. Defense in depth not only protects the system from an APT attack but it can also provide valuable information on the attack and the attacker. Such information can not only assist in tracking the attacker but it can also help in minimizing the damage done by an APT attack [16], which is a very useful mechanism as normally it’s not only difficult to detect an APT attack but it’s quite difficult to track the damage done by an APT attack. Table 2 shows an illustration of layer based logical approach to Defense in Depth System.

Table 2: Defense in Depth against APT [16]

Layers Defense Methods

Identity and Access Identity and Access Management

Physical and Environmental Physical and Environmental Security

Network

1. Intrusion detection and prevention system

2. VOIP security

3. Network segmentation and firewall

4. Web and mail content inspection

5. Secure remote access

6. Data encryption

7. Network access control

Operating System Operating System Security

Application Application Firewall

Data Base Database Security

Since Defense in Depth approach is a combination of security tools as shown in Table

3 therefore, it provides a comprehensive security method against new and emerging APT attacks. Defense in Depth works on general approach to defend all assets, while taking into consideration the interconnections and dependencies of assets, and implements available resources in an effective layer based monitoring and protection system, minimizing the business’s exposure to cyber security risks. A comprehensive lay out of tools and techniques used by Defense in Depth are shown in Table 3 [17].


57

Table 3: Defense in Depth Strategy Elements [17]

Defense in Depth Strategy Elements

Risk Management Program

1. Identify Threats 2. Characterize Risk 3. Maintain Asset Inventory

Cyber Security Architecture

1. Standards/ Recommendations 2. Policy 3. Procedures

Physical Security 1. Field Electronics Locked Down 2. Control Centre Access Controls 3. Remote Site Video, Access Controls, Barriers

ICS Network Architecture

1. Common Architectural Zones 2. Demilitarized Zones (DMZ) 3. Virtual LANs

ICS Network Perimeter Security

1. Firewalls/ One-Way Diodes 2. Remote Access & Authentication 3. Jump Servers/ Hosts

Host Security 1. Patch and Vulnerability Management 2. Field Devices 3. Virtual Machines

Security Monitoring 1. Intrusion Detection Systems 2. Security Audit Logging 3. Security Incident and Event Monitoring

Vendor Management 1. Supply Chain Management 2. Managed Services/ Outsourcing 3. Leveraging Cloud Services

The Human Element 1. Policies 2. Procedures 3. Training and Awareness

It is quite clear from Table 2 and Table 3 that Defense in Depth Technique is very

similar to an APT attack technique, as an APT attack is also a combination of different tools being utilized at different phases of APT attack making it difficult to detect and encounter. Using similar approach Defense in Depth applies holistic tactic to shield the system against the APT attacks. Defense in Depth is not a single mechanism but it is a group of multiple things like; people, technology, standard security procedures, operations and awareness to organizations on ATP tactics. The main objective of the Defense in Depth technique is to maximize the chances of avoiding an APT attack and providing a comprehensive recovery and tracking method in case an APT has been successful in accessing organizations network [18].


58

Defense Techniques Proposed in Research Papers and Security Organizations

Layer based approach is not the only measure to counter against APT attacks, security

experts and security providing organizations have proposed other comprehensive methods as well. Table 4 contain some of the methods suggested by most prominent security implementation organizations to counter APT attacks.

Table 4: APT defense methods suggested by well-known security organizations.

Paper Attack Method Defense method

Advanced Persistent Threats: A Symantec Perspective [19]

As per paper, the method APT attacks are divided in 4 phases, which are intrusion in a network, discovery of target, capture of target data and exit from network. The attacks are initiated through email phishing, malware behind advertising clicks, finding vulnerabilities on network, capturing the desired data and remove the traces.

As per paper such attacks can be prevented by implementing; 1- Heuristics based security tools

such as Antivirus and firewalls.

2- Close monitoring and filtering the incoming and outbound traffic

3- Implementing data encryption and use of VPN.

Defending Against Advanced Persistent Threats: Strategies for New Era of Attack [20]

The paper is based on a study conducted by CA Technologies. As per study, the APT attacks are typically conducted on multinational companies with high value data or high value in term of stock market value. In this paper, authors have described an APT attacker as someone who is always looking for vulnerability in order to infiltrate a network, after a successful infiltration the attacker run discovery protocols to learn the details of the network. Such details may include ports information using port scanning or traffic routes. Once such information is collected the attacker searches

The authors of the paper propose following methods in order to counter an APT attack; 1- Block any unused ports. 2- Encrypt Data with at least

128bit, MD5 hash. 3- For auditing purpose log files

must be maintained and should be checked time to time for any suspicious activity.

4- End to End antivirus should be

implemented to secure sessions.

5- Well defined firewall policies

to encounter attacks from


59

for its desired data without raising any suspicion. As soon as the target is identified the attacker captures the data and exits the network. In case of a successful APT attack companies with shared account management policies suffer more losses. Further, a network, which is compromised, can provide the attacker with system logs, user’s passwords and in some cases, the attacker leaves a backdoor in network that can be a nightmare for the company.

internal and external sources. Studies show that a large number of cyber-attacks are conducted with inside assistance; to avoid such incidents following steps can be very useful: 1- Identities should be securely

saved on different virtual networks and user privileges over resources should be assigned very carefully.

2- Accounts belonging to

employee who are no longer part of the organization should be deactivated immediately.

Countering the Advanced Persistent Threat Challenge with Deep Discovery[14]

The data used in this paper is collected from Trend Micro; the paper indicates that the APT Attacks in any organization is like destroying the network as well as the organization. Paper also indicates that an APT attack can easily breach traditional defense systems such as firewalls or antivirus. The description of an APT attacker in this paper is quite the same as described in previous papers. Once the attacker acquires the information about the target network and a point of entry, assailant will bombard the network with every arsenal at its disposal in order to establish a communication structure, which can provide information back to the attacker. Once this task is achieved, the infiltrator will search the network for its target data, which is then

The authors of the paper suggest the following methods to prevent such attacks; 1- Using digital certificate (SSL/SSH)

to avoid downloading any infected file or browser redirection to harmful sites,

2- Implement Sandbox strategy at server’s end, so that files are simulated properly before utilization, updating or implementation.

3- Using heuristics tools and algorithms to black list IP’s on suspicious behavior, routines and sub routines.

4- Proper configuration and installation of security devices check points and tools. Such practices play a core role in effective and strong wall between high value data and the attacker.

5- Separate firewalls be implemented on every server.


60

acquired and sent back to the attacker HQ. In the end, the attacker makes an escape without leaving any trace

6- Using highly effective data encryption techniques. In case if data is even compromised, it will not be an easy task to decipher it

The Study of APT Attack Stage Model [21]

As per paper an APT attack can be divided in to 4 stages, preparation stage, access stage, resident stage and harvest stage. Despite the different names, the concept of stages of an APT attack is same as described in previous papers. Initial stages consist of information gathering (direct or indirect) such as port scanning, vulnerability scanning, search engines along with advanced crawlers and custom developments to get the network information. After getting the information, the infiltration stage is executed based on the gathered information. Accessing the network can be done my numerous ways such as spear phishing emails, social engineering, water hole attack etc. Once inside the network attacker intend to establish command and control mechanism to search the desired data, acquire it and then exit the network. As per studies attacks which are based on social engineering, waterhole and direct approach are highly effective ones. 19% of APT cases use the Zero Day Vulnerabilities and almost 70% has exploit vulnerabilities.

In order to prevent APT attack the authors of this paper suggest the following:

1- First step to counter an APT is educating the staff on how to avoid any internal assistance to an APT attacker. Staff should be aware on how to prevent data leaks through social media or social gathering and not to share authentication information with any one.

2- Proper responsiveness towards the internal network security is very important.

3- To prevent zero day exploits and vulnerabilities (outdated software), user must keep the system and software updated.

4- Firewalls must be configured properly so that only authentic traffic is allowed to access and exit the network.

5- Anti-viruses should be installed not only on the core system (servers) but also on every local host with updated virus definitions.

6- Implementing IDS (Intrusion Detection Systems) that can alerts the authorities under any kind of suspicious activity.

7- Strict I.T. policies as per standards should be implemented for organization’s employees.

8- Deployment of tools like honey pots system with in the network in order to detect any suspicious or malicious activity can be very effective.

9- Use of Sandbox to check any suspicious file before allowing it on the network or installing it can also


61

be very useful. Security experts must go through risk assessment if an APT attack is successfully conducted or in progress on the network. So that exact awareness of damage done or expected damage can be calculated for recovery and response purposes

Behavior Analysis Tools to Counter APT Attacks

Earlier sections have discussed a comprehensive approach or a complete method,

which can be used to safe guard against APT attacks. In this section, tools that can be used to detect an APT attack are being discussed. Table 5 shows a list of tools, which can be used effectively to detect an APT, attack on a network. These tools have two basic analytical methods to detect an abnormality in a network. Mentioned tools can detect an APT based on the behaviour of the network and coding of application running with in the network.

Table 5: Tools to perform coding and behavioural analysis to detect an APT attack [22]

Tool Description Analysis Method

Autoruns Provides a list of auto-start file locations. Behavioural

Process Monitor

Log changes in any registry, file, process, thread etc. Behavioural

ListDLLs Provides a list of DLL files on the system. Behavioural

TCPview Monitor or log active end to end TCP/UDP connections Behavioural

VMMap Provide details of virtual and physical memory utilized by any program Behavioural

Capture-Bat Honey pot services at client end, to log and monitor any attack. Behavioural

Wireshark Packet Sniffer and network protocol analyser Behavioural

REMnux Utility tool based on Linux to analyse any malware and to reverse-engineer it. Behavioural/Coding

FileInsight Utility software to display file in both text and hexadecimal format. Coding

Such tools can be very effective but in a live and active network, it becomes a great

issue to identify any abnormality especially when the network has thousands of nodes and users with huge amount of application data transactions per second.


62

RECENT APT ATTACKS

APT attacks are still escalating, despite of all the research and security measures that are being implemented. As per recent report by Symantec [23] and Kaspersky Security [24, 25] McAfee [26-28].

Table 6: Reported attacks during last few years

Symantec [23]

Year 2013 2014 2015

Total Breaches 253 312 318

Mobile Vulnerabilities 127 168 528

Zero-Day Vulnerabilities 23 24 54

Kaspersky Security [24, 25]

Year 2014 2015 2016

Attempts to launch malware capable of stealing money

1,911,266 1,966,324 (2.8% higher than in

2014)

1,198,264

Number of users attacked by Trojan-Ransom malware

1,28,132 (Oct to Dec 2014)

3,37,205 (July to Sept 2015)

Not available

Number of users attacked by encryptors (Trojan-Ransom encryptor malware)

1,20,840 1,79,209 821,865

McAfee [26-28]

Year 2013 2014 2015 2016

Total malware attacks 190,000,000 340,000,000 430,000,000 650,000,000

Total ransom ware 1,500,000 2,200,000 4,900,000 8,700,000

Total rootkits malware 1,220,000 1,500,000 1,700,000 Not available


63

CONCLUSION

Despite availability of defense methods against an APT, the high percentage of successful APT attacks clearly indicates that much is needed to be done for fighting against APT attacks. As this paper is intended to highlight the basic attack patterns of an APT and defense methods being deployed to counter APT attacks, therefore readers will find it evident that current defense methods are not fully equipped to encounter such highly organized attacks. Among the APT defense methods discussed in the paper, some can be quite effective in near future. For example, if organizations report network breaches with complete analysis on the attack, it could help the security experts in proposing a much better defense mechanism to counter such attacks in future. But there are number of reasons why most of the organizations fail to report security breaches. Reasons may include reputational concern of an organization or at times organizations are not even aware that their network is being compromised, which is quite alarming. Layer based defense methods and security systems with automated capabilities to identify false flag or to detect, analyze and encounter malicious activity can play a vital role in approaching times. But a lot of work is needed to be done in order for such system to be highly efficient; as such system with high level of check and balance might result in degrading overall performance of a network. As the world is moving towards Internet of Things (IoT), the importance of security is a very important area, which cannot be neglected.

REFERENCES

[1] Symantec, "Internet Security Threat Report," 2014, Available: http://www.

symantec.com/security_response/publications/threatreport.jsp., Accessed on: May 2015

[2] MacAfee, "Net losses: Estimating the global cost of cybercrime," 2014, Available: http://csis.org/files/attachments/140609_rp economic impact cybercrime report.pdf, Accessed on: May, 2015

[3] Crashing the system. (2014, July 12) The Economist. Available: http://www. economist.com/news/special-report/21606419-howprotect-critical-infrastructure-cyber-attacks-crashing-system

[4] M. Kenney, "Cyber-terrorism in a post-stuxnet world," Orbis, vol. 59, no. 1, pp. 111-128, 2015. doi: 10.1016/S1353-4858(11)70086-1

[5] M. Zeller, "Myth or reality -Does the Aurora vulnerability pose a risk to my generator?," in 64th Annual Conference for Protective Relay Engineers, 2011, pp. 130-136. doi: 10.1109/CPRE.2011.6035612

[6] C. Tankard, "Advanced Persistent threats and how to monitor and deter them," Network Security, vol. 2011, no. 8, pp. 16-19, 2011.

[7] Ponemon Institute, "2011 Cost of Data Breach Study: United States," 2012, Available: http://www.ponemon.org/local/upload/file/2011_US_CODB_FINAL_5.pdf


64

[8] M. A. Siddiqi and N. Ghani, "Critical Analysis on Advanced Persistent Threats," Int. J. Comput. Appl., vol. 141, no. 13, pp. 46-50 Available: www.ijcaonline.org/archives/ volume141/number13/siddiqi-2016-ijca-909784.pdf

[9] D. Alperovitch, "Revealed: Operation Shady RAT," 2011, Available: https://www.mcafee.com/us/resources/white-papers/wp-operation-shady-rat.pdf

[10] NIST, "Managing Information Security Risk Organization, Mission, and Information System View " Computer Security Division, Information Technology Laboratory, National Institute of Standards and Technology 2011, doi: 10.6028/NIST.SP.800-39.

[11] GreAt. Darkhotel's attacks in 2015. Available: https://securelist.com/blog/ research/71713/darkhotels-attacks-in-2015/

[12] A. Redondo-Hernández et al., "Detection of Advanced Persistent Threats Using System and Attack Intelligence," in EMERGING 2015: The Seventh International Conference on Emerging Networks and Systems Intelligence, 2015, pp. 91-94

[13] P. Chen et al., "A Study on Advanced Persistent Threats," vol. 8735, Communications and Multimedia Security. CMS 2014. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2014, pp. 63-72. [Online]. doi: 10.1007/978-3-662-44885-4_5

[14] Trend Micro, "Countering the Advanced Persistent Threat Challenge with Deep Discovery," April 2013, Available: https://www.trendmicro.de/cloud-content/us /pdfs/business/white-papers/wp_deepdiscovery.pdf

[15] B. Hudson, "Advanced Persistent Threats: Detection, Protection and Prevention," Sophos Ltd., US February 2014, Available: http://resources.idgenterprise.com/ original/AST-0112935_sophos-advanced-persistent-threats-detection-protection-prevention.pdf

[16] J. V. Chandra et al., "Intelligence Based Defense System to Protect from Advanced Persistent Threat by Means of Social Engineering on Social Cloud Platform," Indian J. Sci. Technol., vol. 8, no. 28, p. 1, 2015.

[17] A. Shamim et al., "Layered defense in depth model for it organizations," in ICCET’ 2014: 2nd International Conference on Innovations in Engineering and Technology, 2014, Penang, Malaysia

[18] Industrial Control Systems Cyber Emergency Response Team, "Recommended practise: Improving industrial control systems cybersecurity with defense-in-depth strategies," Department of Homeland Security, Control Systems Security Program, National Cyber Security Division 2009, Available: https://ics-cert.us-cert.gov/sites/default/files/recommended_practices/NCCIC_ICS-CERT_Defense_in_Depth_2016_S508C.pdf

[19] Symantec, "Advanced Persistent Threats: A Symantec Perspective," Symantec Corporation 2011, Available: https://www.symantec.com/content/en/us/enterprise/white_papers/b-advanced_persistent_threats_WP_21215957.en-us.pdf


65

[20] CA Technologies, "Defending Against Advanced Persistent Threats : Strategies for a New Era of Attacks Security Threats As We Know Them Are Changing," 2014, Available: http://www.valleytalk.org/wp-content/uploads/2015/05/defending-against-advanced-persistent-threats.pdf

[21] M. Li et al., "The study of APT Attack Stage Model: 978-1-5090-0806-3/16/$31.00," in IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), Okayama, Japan, 2016, pp. 1-5

[22] F. Li, "A Detailed Analysis of an Advanced Persistent Threat Malware," The Sans Institute 2011, Available: https://www.sans.org/reading-room/whitepapers /malicious/detailed-analysis-advanced-persistent-threat-malware-33814

[23] Symantec, "Internet Security Threat Report," April 2016, vol. 21 Available: https://resource.elq.symantec.com/LP=2899, Accessed on: May 2015

[24] M. Garnaeva, J. V. D. Wiel, D. Makrushin, A. Ivanov, Y. Namestnikov, "Kaspersky Security Bulletin," in "Overall statistics for 2015," Kaspersky Lab. December 15, 2015, Available: https://securelist.com/files/2015/12/Kaspersky-Security-Bulletin-2015_FINAL_EN.pdf

[25] D. Emm et al., "IT Threat Evolution in Q3 2016," Kaspersky 2016, Available: https://cdn.press.kaspersky.com/files/2015/10/IT_threat_evolution_Q2_2015_ENG.pdf

[26] McAfee Labs, "Threats Report," September 2016, Available: https://www.mcafee.com/us/resources/reports/rp-quarterly-threats-sep-2016.pdf

[27] McAfee Labs, "Threats Report," August 2015, Available: https://www.mcafee.com/hk/resources/reports/rp-quarterly-threats-aug-2015.pdf

[28] McAfee Labs, "Threats Report," August 2017, Available: https://www.mcafee.com/hk/resources/reports/rp-quarterly-threats-aug-2015.pdf

Guidelines For Authors

Pakistan Journal of Computer and Information Systems (PJCIS) is an official journal of PASTIC meant for students and professionals of Computer Science & Engineering, Information & Communication Technologies (ICTs), Information Systems, Library and Information Science. This biannual open access journal is aimed at publishing high quality papers on theoretical developments as well as practical applications in all above cited fields. The journal also aims to publish new attempts on emerging topics/areas, original research papers, reviews and short communications.

Manuscript format: Manuscript should be in English, typed using font Times New Roman and double spaced throughout with 0.5 inches indention and margins of at least 1.0 inch in Microsoft word format (doc or docx) or LaTex. The manuscripts should be compiled in following order: Abstract, Keywords, Introduction, Materials and Methods, Results, Discussion, Conclusion, Recommendations, Acknowledgement and References. Follow the format guidelines provided in the following table:

Type Characteristics Letter case Title Font size 16, bold, single line spacing

centered Sentence case

Author names Font size 12, bold, centered, single line spacing

Uppercase

Institutional affiliations

Font size 10, centered, single line spacing

Sentence case

All headings Font size 16, bold, centralized and numbered

Uppercase

Body text Font size 12 and justified Sentence case

Captions (Table/Figures)

Font size 12 and bold Sentence case

Title: This should contain title of the articles by capitalizing initial letter of each main word. Avoid abbreviations in the title of the manuscript.

Author’s Name: Title should be followed by complete author names, e-mail addresses, landline, fax and cell numbers. In case of more than one author with different Institutions, use numbering in superscripts (1, 2, 3 etc) to separate the author's affiliation. Do not use any other symbol such as *, † or ‡. For multiple authors, use comma as separator.

Author’s Affiliation: Mark the affiliations with the respective numbering in superscripts (1, 2, 3 etc) used in the authors field. PJCIS encourages the listing of authors’ Open Researcher and Contributor Identification (ORCID).

Corresponding author: One author should be identified as the corresponding author by putting asterisk after the name as superscript.

Abstracts: Abstract must be comprehensive and self explanatory stating brief methodology, important findings and conclusion of the study. It should not exceed 300 words.

Keywords: Three to eight keywords depicting the article should be provided below the abstracts separated by comma.

Introduction: The introduction should contain all the background information a reader needs to understand the rest of the author’s paper. This means that all important concepts should be explained and all important terms defined. Introduction must contain the comprehensive literature review on the problem to be addressed along with concise statement of the problem, objectives and valid hypotheses to be tested in the study.

Material and Methods: An adequate account of details about the procedures involved should be provided in a concise manner.

Results & Discussion: Results should be clear, concise and presented in a logical sequence in the text along with, tables, figures and other illustrations. If necessary, subheadings can be used in this section. Discussion should be logical and results must be discussed in the light of previous relevant studies justifying the findings. If necessary, it may be split into separate "Results" and "Discussion" sections.

Recommendations: Research gap may be identified and recommendations for future research should be given here.

Acknowledgement: In a brief statement, acknowledge assistance provided by people, institutions and financing.

Abbreviations: Abbreviations used should be elaborated at least first time in the text inside parenthesis e.g., VPN (Virtual Private Network).

Tables/Figures: Caption must have descriptive headings and should be understandable without reference to the text and should include numbers [e.g., Figure 1: Summary of quantitative data analysis]. Put citation at the end of the table/figure or in the text, if it has been copied from already published work. Photographic prints must be of high resolution. Figures will appear black & white in the printed version however, they will appear coloured in the full text PDF version available on the website. Figures and tables should be placed exactly where they are to appear within the text. Figures not correctly sized will be returned to the author for reformatting.

References: All the references must be complete and accurate. When referring to a reference in the text of the document, put the number of the reference in square brackets e.g., [1]. All references in the bibliography should be listed in citation order of the authors. Always try to give url or doi in case of electronic/digital sources along with access date. References should follow IEEE style, Details can be downloaded from Journal’s home page. http://www.pastic.gov.pk/pjcis.aspx

Publication Ethic Policy: For submission to the Pakistan Journal of Computer and Information Systems please follow the publication ethics listed below:

The research work included in the manuscript is original. All authors have made significant contribution to the conception, design, execution,

analysis or interpretation of the reported study. All authors have read the manuscript and agree for submission to the Pakistan Journal of

Computer and Information Systems. This manuscript has not been submitted or published elsewhere in any form except in the

form of dissertation, abstract or oral or poster presentation. All the data has been extracted from the original research study and no part has been

copied from elsewhere. All the copied material has been properly cited.

Journal Review Policy: PJCIS follows double blind peer review policy. All the manuscripts are initially evaluated by the internal editorial committee of the PASTIC. Manuscripts of low quality which do not meet journal’s scope or have similarity index > 20% are rejected and returned to the authors. Initially accepted manuscripts are forwarded to two reviewers (one local and one foreign). Maximum time limit for review is six weeks. After submission of reviewer’s report final decision on the manuscript is made by the chief editor which will be communicated to the corresponding author through email. All dates i.e., submission and acceptance will be clearly printed on the manuscript.

Submission policy: Before submission make sure that all authors have read publication ethics policy for PJCIS and agree to follow them. There is no submission or processing charges for publishing in Pakistan Journal of Computer and Information Systems. Manuscript complete in all respect should be submitted to [email protected] or [email protected]

Editorial Office: Correspondence should be made at the following address:

Editor in Chief

Prof. Dr Muhammad Akram Shaikh Director General

Pakistan Scientific & Technological Information Centre (PASTIC) Quaid-e-Azam University (QAU) Campus

Islamabad Please send all inquiries to [email protected]

Documents

ISSN Print : 2519-5395 ISSN Online : 2519-5409 - PASTICpastic.gov.pk/downloads/PJCIS/PJCIS_V1_2.pdfISSN Print : 2519-5395 ISSN Online : 2519-5409. ISSN Print : 2519-5395 ISSN Online