12
Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National Laboratory FTP Amir Razmjou

Data mining cyber security

Embed Size (px)

Citation preview

Page 1: Data mining   cyber security

Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National

Laboratory FTP

Amir Razmjou

Page 2: Data mining   cyber security

Pattern-based Techniques and Today’s Cybersecurity Challenges

• Protocols specifications evolve more rapidly• Vendor-Specific, Closed Standard Protocols. • Network traffic verification against protocol

specifications does not always account for legitimate traffic,– XML XXE Attacks– FTP Bounce Attacks

• Unknown attacks.• That abnormality to user interactions account for

changes.

Page 3: Data mining   cyber security

Sequential Pattern Mining

• It is similar to the frequent item sets mining, but with consideration of ordering.

• Sequential Pattern Mining is useful in many application.– Customer shopping sequences: – Medical treatments, natural disasters (e.g., earthquakes),

science & eng. processes, stocks and markets, etc.

• Useful for extraction of knowledge from semi-structured data (i.e. XML)

Page 4: Data mining   cyber security

What is sequence database and sequential pattern mining

• A sequence database consists of ordered elements or events where each element is an unordered set of items.

SID sequences

10 <a(abc)(ac)d(cf)>

20 <(ad)c(bc)(ae)>

30 <(ef)(ab)(df)cb>

40 <eg(af)cbc>

TID itemsets

10 a, b, d

20 a, c, d

30 a, d, e

40 b, e, f

Page 5: Data mining   cyber security

Sequential Shopping Cart

Transaction 1 biscuits

Sequence 1

biscuits

Sequence 2

biscuits

Sequence 3

snack

Sequence 4

baking needs

frozen foods frozen foods salads cake

fruit frozen foods chickens

fruit baking needs beef snack

Transaction 2 baking needs cake pet food

cake baking needs lamb

vegetables snack chickens

pet food electrical salads

Transaction 3

snack snack lamb brushware

salads

chickens salads salads

beef chickens

Transaction 5 chickens electrical brushware

Page 6: Data mining   cyber security

Sample FTP FlowWelcome to Microsoft FTP Server 3.4USER anonymous331 Guest login ok, send your complete e-mail address as password.PASS <password>230 Guest login ok, access restrictions apply.TYPE I200 Type set to I.CWD xfig250 CWD command successful.

Page 7: Data mining   cyber security

Data Preparation

Page 8: Data mining   cyber security

Resulting DatasetSource Destination APP Signature COMMAND CODE

4.251.189.14:33257 131.243.1.10:21 custom1 USER 3314.251.189.14:33257 131.243.1.10:21 custom1 PASS 2304.251.189.14:33257 131.243.1.10:21 custom1 REST 3504.251.189.14:33257 131.243.1.10:21 custom1 TYPE 2004.251.189.14:33257 131.243.1.10:21 custom1 CWD 2504.251.189.14:33257 131.243.1.10:21 custom1 TYPE 200

140.114.97.25:33983 131.243.1.10:21 custom1 USER 331140.114.97.25:33983 131.243.1.10:21 custom1 PASS 230140.114.97.25:33983 131.243.1.10:21 custom1 SYST 215140.114.97.25:33983 131.243.1.10:21 custom1 CWD 55053.55.176.50:10011 131.243.1.10:21 custom1 USER 33153.55.176.50:10011 131.243.1.10:21 custom1 PASS 23053.55.176.50:10011 131.243.1.10:21 custom1 FEAT 50053.55.176.50:10011 131.243.1.10:21 custom1 SYST 21553.55.176.50:10011 131.243.1.10:21 custom1 PWD 257

Page 9: Data mining   cyber security

Result Sequence Rules

[1] <{USER}{PASS,230}{TYPE,200}{PASV,227}{RETR,150}> 6391[2] <{USER}{PASS,230}{TYPE,200}{SIZE,213}{RETR,150}> 4853[3] <{USER,331}{PASS}{TYPE,200}{PASV,227}{RETR,150}> 6391[4] <{USER,331}{PASS}{TYPE,200}{SIZE,213}{RETR,150}> 4853[5] <{USER,331}{PASS,230}{CWD,250}{TYPE,200}{150}> 4872[6] <{USER,331}{PASS,230}{TYPE}{PASV,227}{RETR,150}> 6391[7] <{USER,331}{PASS,230}{TYPE}{SIZE,213}{RETR,150}> 4853[8] <{USER,331}{PASS,230}{TYPE,200}{PASV}{RETR,150}> 6392[9] <{USER,331}{PASS,230}{TYPE,200}{PASV,227}{RETR}> 7927[10] <{USER,331}{PASS,230}{TYPE,200}{PASV,227}{150}> 8342[11] <{USER,331}{PASS,230}{TYPE,200}{SIZE}{RETR,150}> 5062[12] <{USER,331}{PASS,230}{TYPE,200}{SIZE,213}{RETR}> 4893

Page 10: Data mining   cyber security

Abnormal Flows

USER, 331 , PASS, 230, PORT, 200, 500, QUIT, 221, 220, PWD, 257, SYST, 215, CWD, 550, PASV, 227, TYPE, SIZE,213, RETR, 150, 226, MDTM, 250, LIST, 421, ABOR,533, Udd20dfd1U, U15030ab9U, U54668fafU, Udb6ef1c3U, U7694531dU, PORTQUIT, U07c4edf9U, U8855979dU, Uab12679fU, Uc2ca1083U, U5b79257aU, U5f561953U, Ud4a28da8U

wu2616121custom1wu2616120proftpdrc2general172125general8msftp4msftpsunos41sunos56othergeneral5vxworks54WarFTPd167proftpdpre

• Commands in unmatched flows

• Signatures of FTP servers in unmatched flows

7%

Page 11: Data mining   cyber security

Sequence Size and Support

Page 12: Data mining   cyber security

References• Almulhem, A., & Traore, I. (2007). Mining and detecting

connection-chains in network traffic. IFIP International Federation for Information Processing, 238, 47–57. http://doi.org/10.1007/978-0-387-73655-6_4

• Bronson, B. J. (2004). Protecting Your Network from ARP Spoofing-Based Attacks, 1–5.

• Scigocki, M., & Zander, S. (2013). Improving Machine Learning Network Traffic Classification with Payload-based Features, (November), 1–7.

• Zander, S., Zander, S., Nguyen, T., Nguyen, T., Armitage, G., & Armitage, G. (2005). Automated Traffic Classification and Application Identification using Machine Learning. Proceedings of the IEEE.