12

Click here to load reader

Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National Laboratory FTP

Embed Size (px)

Citation preview

Page 1: Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National Laboratory FTP

Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National

Laboratory FTP

Amir Razmjou

Page 2: Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National Laboratory FTP

Pattern-based Techniques and Today’s Cybersecurity Challenges

• Protocols specifications evolve more rapidly• Vendor-Specific, Closed Standard Protocols. • Network traffic verification against protocol

specifications does not always account for legitimate traffic,– XML XXE Attacks– FTP Bounce Attacks

• Unknown attacks.• That abnormality to user interactions account for

changes.

Page 3: Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National Laboratory FTP

Sequential Pattern Mining

• It is similar to the frequent item sets mining, but with consideration of ordering.

• Sequential Pattern Mining is useful in many application.– Customer shopping sequences: – Medical treatments, natural disasters (e.g., earthquakes),

science & eng. processes, stocks and markets, etc.

• Useful for extraction of knowledge from semi-structured data (i.e. XML)

Page 4: Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National Laboratory FTP

What is sequence database and sequential pattern mining

• A sequence database consists of ordered elements or events where each element is an unordered set of items.

SID sequences

10 <a(abc)(ac)d(cf)>

20 <(ad)c(bc)(ae)>

30 <(ef)(ab)(df)cb>

40 <eg(af)cbc>

TID itemsets

10 a, b, d

20 a, c, d

30 a, d, e

40 b, e, f

Page 5: Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National Laboratory FTP

Sequential Shopping Cart

Transaction 1 biscuits

Sequence 1

biscuits

Sequence 2

biscuits

Sequence 3

snack

Sequence 4

baking needs

frozen foods frozen foods salads cake

fruit frozen foods chickens

fruit baking needs beef snack

Transaction 2 baking needs cake pet food

cake baking needs lamb

vegetables snack chickens

pet food electrical salads

Transaction 3

snack snack lamb brushware

salads

chickens salads salads

beef chickens

Transaction 5 chickens electrical brushware

Page 6: Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National Laboratory FTP

Sample FTP FlowWelcome to Microsoft FTP Server 3.4USER anonymous331 Guest login ok, send your complete e-mail address as password.PASS <password>230 Guest login ok, access restrictions apply.TYPE I200 Type set to I.CWD xfig250 CWD command successful.

Page 7: Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National Laboratory FTP

Data Preparation

Page 8: Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National Laboratory FTP

Resulting DatasetSource Destination APP Signature COMMAND CODE

4.251.189.14:33257 131.243.1.10:21 custom1 USER 3314.251.189.14:33257 131.243.1.10:21 custom1 PASS 2304.251.189.14:33257 131.243.1.10:21 custom1 REST 3504.251.189.14:33257 131.243.1.10:21 custom1 TYPE 2004.251.189.14:33257 131.243.1.10:21 custom1 CWD 2504.251.189.14:33257 131.243.1.10:21 custom1 TYPE 200

140.114.97.25:33983 131.243.1.10:21 custom1 USER 331140.114.97.25:33983 131.243.1.10:21 custom1 PASS 230140.114.97.25:33983 131.243.1.10:21 custom1 SYST 215140.114.97.25:33983 131.243.1.10:21 custom1 CWD 55053.55.176.50:10011 131.243.1.10:21 custom1 USER 33153.55.176.50:10011 131.243.1.10:21 custom1 PASS 23053.55.176.50:10011 131.243.1.10:21 custom1 FEAT 50053.55.176.50:10011 131.243.1.10:21 custom1 SYST 21553.55.176.50:10011 131.243.1.10:21 custom1 PWD 257

Page 9: Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National Laboratory FTP

Result Sequence Rules

[1] <{USER}{PASS,230}{TYPE,200}{PASV,227}{RETR,150}> 6391[2] <{USER}{PASS,230}{TYPE,200}{SIZE,213}{RETR,150}> 4853[3] <{USER,331}{PASS}{TYPE,200}{PASV,227}{RETR,150}> 6391[4] <{USER,331}{PASS}{TYPE,200}{SIZE,213}{RETR,150}> 4853[5] <{USER,331}{PASS,230}{CWD,250}{TYPE,200}{150}> 4872[6] <{USER,331}{PASS,230}{TYPE}{PASV,227}{RETR,150}> 6391[7] <{USER,331}{PASS,230}{TYPE}{SIZE,213}{RETR,150}> 4853[8] <{USER,331}{PASS,230}{TYPE,200}{PASV}{RETR,150}> 6392[9] <{USER,331}{PASS,230}{TYPE,200}{PASV,227}{RETR}> 7927[10] <{USER,331}{PASS,230}{TYPE,200}{PASV,227}{150}> 8342[11] <{USER,331}{PASS,230}{TYPE,200}{SIZE}{RETR,150}> 5062[12] <{USER,331}{PASS,230}{TYPE,200}{SIZE,213}{RETR}> 4893

Page 10: Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National Laboratory FTP

Abnormal Flows

USER, 331 , PASS, 230, PORT, 200, 500, QUIT, 221, 220, PWD, 257, SYST, 215, CWD, 550, PASV, 227, TYPE, SIZE,213, RETR, 150, 226, MDTM, 250, LIST, 421, ABOR,533, Udd20dfd1U, U15030ab9U, U54668fafU, Udb6ef1c3U, U7694531dU, PORTQUIT, U07c4edf9U, U8855979dU, Uab12679fU, Uc2ca1083U, U5b79257aU, U5f561953U, Ud4a28da8U

wu2616121custom1wu2616120proftpdrc2general172125general8msftp4msftpsunos41sunos56othergeneral5vxworks54WarFTPd167proftpdpre

• Commands in unmatched flows

• Signatures of FTP servers in unmatched flows

7%

Page 11: Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National Laboratory FTP

Sequence Size and Support

Page 12: Using GSP data mining algorithm to detect malicious flows in Lawrence Berkeley National Laboratory FTP

References• Almulhem, A., & Traore, I. (2007). Mining and detecting

connection-chains in network traffic. IFIP International Federation for Information Processing, 238, 47–57. http://doi.org/10.1007/978-0-387-73655-6_4

• Bronson, B. J. (2004). Protecting Your Network from ARP Spoofing-Based Attacks, 1–5.

• Scigocki, M., & Zander, S. (2013). Improving Machine Learning Network Traffic Classification with Payload-based Features, (November), 1–7.

• Zander, S., Zander, S., Nguyen, T., Nguyen, T., Armitage, G., & Armitage, G. (2005). Automated Traffic Classification and Application Identification using Machine Learning. Proceedings of the IEEE.