25
Problems Solution Model Illustration Ensemble Based Categorization and Adaptive Learning Model for Malware Detection Muhammad Najmi bin Ahmad Zabidi IAS 2011, Universiti Teknikal Melaka (UTEM) 6th December 2011 Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 1/25

Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

Embed Size (px)

DESCRIPTION

presented at IAS 2011, Malacca, Malaysia

Citation preview

Page 1: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Ensemble BasedCategorization and

Adaptive Learning Modelfor Malware Detection

Muhammad Najmi bin Ahmad [email protected]

IAS 2011, Universiti Teknikal Melaka (UTEM)

6th December 2011Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 1/25

Page 2: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

About

• Phd student at Universiti Teknologi Malaysia, Skudai

• Employed by International Islamic University Malaysia,Gombak

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 2/25

Page 3: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Overview

• Malware detection is considered‘‘undecidable’’[Cohen, 1986]

• Means 100 percent detection for all time is impossible

• But there’s still room for highest detection accuracy

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 3/25

Page 4: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Problem 1 - Features

• Malware detection depends on features to generatesignatures

• Some features could be redundant, hence computationtime is more expensive

• Features could be weak, not relevant

• There is possibility that strong features are enough, anddiscard the weaker ones

• This, could be reduce by dimesion reduction method

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 4/25

Page 5: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Problem 2 - Classification ofSoftware

• Classification here refers to classification betweenmalicious, suspicious and benign software

• Tackling the problem of false positive, false negative andincrease precision

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 5/25

Page 6: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Problem 3 - Tackling new malware

• Unknown malware is the problem

• No prior knowledge

• Suggesting unsupervised categorization

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 6/25

Page 7: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Related works on malware detection

Statistical based:

• [Chouchane et al., 2007, Saudi et al., 2010,Merkel et al., 2010]

Data mining and machine learning:

• [Sun et al., 2010, Komashinskiy and Kotenko, 2009,Komashinskiy and Kotenko, 2010]

• [Elovici et al., 2007, Gavrilut et al., 2009,Firdausi et al., 2010, Golovko et al., 2010]

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 7/25

Page 8: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Solutions

Feature Selection

• Use feature selection to reduce processing overhead

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 8/25

Page 9: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Categorization and Ensemble

• Use generic classifier at first to segregate malware andnon malware

• Use specific classifier secondly to segregate special traitsof malware (trojan, worm, virus)

• Supervised categorization is needed, to classify knownmalware features

• In recent literatures, the term semi-supervised learning iscoined to represent the ‘‘assisted’’ unsupervisedcategorization

• Ensemble classification helps, since base weak learnercould be boosted

• Unsupervised categorization (clustering) needed, tocategorization unknown malware

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 9/25

Page 10: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Adaptive Learning

• Use adaptive learning hence the new malware whichpreviously unknown can be taught as known, hence willbe discarded at early phase

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 10/25

Page 11: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

Suggestion of Model

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 11/25

Page 12: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

Phase 1

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 12/25

Page 13: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

P1 descriptions

• Preprocessing work includes ripping API calls, or anyother useful information from the malware binaries

• The process of feature selection is being done here

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 13/25

Page 14: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

Features

• Features, in this case is API calls:• The less API calls could be used, the better• Dimension reduction method is being used to handle this

• Future work, we considering adding entropy analysis ofpacked binary body, apart from the API calls profiling

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 14/25

Page 15: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

Interesting API calls

CreateMutex,

NtasdfCreateFile

call shell32

advapi32.RegOpenKey

KERNEL32.CreateProcess,

shdocvw,

gethostbyname,

advapi32.RegCreate,

advapi32.RegSet

http://

OutputDebugString

FindWindow

IsDebuggerPresent

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 15/25

Page 16: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

Phase 2

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 16/25

Page 17: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

P2 Descriptions

• Malware being categorized according to common traitsof generic malware

• Next, specific symptom according to the classes ofmalware (worm, trojan, virus) being done

• Malware could have all the packages together, butusually there is dominant feature

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 17/25

Page 18: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

Phase 3

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 18/25

Page 19: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

P3 Descriptions

• Use ensemble based classification, using weak learners

• Many weak learners, via voting could represent moreaccurate results

• If there is unknown class, it will go into into clusteringphase

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 19/25

Page 20: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 20/25

Page 21: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

P4 Descriptions

• A signature being created, if the malware is new

• The new signature will be added to the currentcategorization

• This will minimize the next detection cycle for the nextmalware

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 21/25

Page 22: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

The Dataset

In malware research, there is no standard dataset, unlikeIntrusion Detection area which usually relied on KDD/MITLincoln datasets.

• We obtain malware samples fromCyberSecurityMalaysia(CSM), consists of 2GB malwarefiles, amounted around 30,000 malware binaries

• We have to build our own dataset to extract the features

• This, considered preprocessing work

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 22/25

Page 23: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

Conclusion

• Soft computing approach could assist in malwaredetection

• Feature selection could assist in minimizing featureprocessing

• Ensemble methods could help in increasing malwarecategorization

• Adaptive learning could help in avoiding redundantretraining for the n next iteration

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 23/25

Page 24: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 24/25

Page 25: Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

ProblemsSolution

Model Illustration

Phase 1Phase 2Phase 3Phase 4

BibliographyChouchane, M. R., Walenstein, A., and Lakhotia, A. (2007).Statistical signatures for fast filtering ofinstruction-substituting metamorphic malware.In Proceedings of the 2007 ACM workshop on Recurringmalcode, WORM ’07, pages 31--37, New York, NY, USA.ACM.

Cohen, F. B. (1986).Computer viruses.PhD thesis, Los Angeles, CA, USA.AAI0559804.

Elovici, Y., Shabtai, A., Moskovitch, R., Tahan, G., andGlezer, C. (2007).Applying machine learning techniques for detection ofmalicious code in network traffic.In Hertzberg, J., Beetz, M., and Englert, R., editors, KI2007: Advances in Artificial Intelligence, volume 4667 ofLecture Notes in Computer Science, pages 44--50. SpringerBerlin / Heidelberg.

Firdausi, I., Lim, C., Erwin, A., and Nugroho, A. S. (2010).Analysis of machine learning techniques used inbehavior-based malware detection.Advances in Computing, Control, and TelecommunicationTechnologies, International Conference on, 0:201--203.

Gavrilut, D., Cimpoesu, M., Anton, D., and Ciortuz, L.(2009).Malware detection using machine learning.In Proc. Int. Multiconference Computer Science andInformation Technology IMCSIT ’09, pages 735--741.

Golovko, V., Bezobrazov, S., Kachurka, P., andVaitsekhovich, L. (2010).Neural network and artificial immune systems formalware and network intrusion detection.In Koronacki, J., Ras, Z., Wierzchon, S., and Kacprzyk, J.,editors, Advances in Machine Learning II, volume 263 ofStudies in Computational Intelligence, pages 485--513.Springer Berlin / Heidelberg.

Komashinskiy, D. and Kotenko, I. (2009).Integrated usage of data mining methods for malwaredetection.In Cartwright, W., Gartner, G., Meng, L., and Peterson,M. P., editors, Information Fusion and GeographicInformation Systems, Lecture Notes in Geoinformation andCartography, pages 343--357. Springer Berlin Heidelberg.

Komashinskiy, D. and Kotenko, I. (2010).Malware detection by data mining techniques based onpositionally dependent features.In Proc. 18th Euromicro Int Parallel, Distributed andNetwork-Based Processing (PDP) Conf, pages 617--623.

Merkel, R., Hoppe, T., Kraetzer, C., and Dittmann, J. (2010).Statistical detection of malicious pe-executables for fastoffline analysis.In De Decker, B. and Schaumller-Bichl, I., editors,Communications and Multimedia Security, volume 6109 ofLecture Notes in Computer Science, pages 93--105. SpringerBerlin / Heidelberg.

Saudi, M., Cullen, A., and Woodward, M. (2010).Statistical Analysis in Evaluating STAKCERT Infection,Activation and Payload Methods.In Proceedings of the World Congress on Engineering,volume 1.

Sun, X., Huang, Q., Zhu, Y., and Guo, N. (2010).Mining distinguishing patterns based on malware traces.In Proc. 3rd IEEE Int Computer Science and InformationTechnology (ICCSIT) Conf, volume 2, pages 677--681.

Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 25/25