32
Manel Abdellatif introducing Parallel programming for accelerating Malware detection

introducing Parallel programming for accelerating Malware

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: introducing Parallel programming for accelerating Malware

Manel Abdellatif

introducing

Parallel programming for accelerating Malware detection

1

Page 2: introducing Parallel programming for accelerating Malware

Motivations

Introduction : Parallelism & security

• Ever-growing number of threats • The market boost of embedded systems• The increasing variety of operating systems• Malware detection is a highly common and computationally-intensive

problem in intrusion detection systems

1Real time detection

2High throughtput and performance

3Need to execute many detection algorithms

Page 3: introducing Parallel programming for accelerating Malware

What about small-scale systems?

o How to get benefit from parallel architectures to monitor small-scale systems?

• Great use of small-scale systems : mobile phones, gaming consoles, SoC etc.

• Memory and computation performance constraints• Ever growth of attacks on small-scale systems • Improvement in parallel computation performance

Security threats Parallel performance computation

VS

Introduction : Parallelism & security

Page 4: introducing Parallel programming for accelerating Malware

Introduction : Parallelism & security

Previous Work• Development of parallel architecture for malware

detection based on pattern matching technique• Achieving better computing performance • Use of Cuda and desktop GPU

Current Work• Migration to mobile platform • Use of OpenCL• Building of behavioral malware dataset based on

syscalls patterns • Development of memory optimization techniques

Page 5: introducing Parallel programming for accelerating Malware

Benefits of mobile GPGPU

Parallel Processing architecture

To ensure high security level of mobile devices, accelerating malware detection can be provided by GPU parallel processing

Offering a complementary processing unit to the CPU

Adapted to SIMD architecture

Fast memory types access (shared memory , constant memory)

More and more evolving

GPUs driven by high-end applications: prepared to pay a premium for high performance

5

Parallel Processing architecture

Page 6: introducing Parallel programming for accelerating Malware

Evolution of GPUs for embedded systems

Parallel Processing architecture

6

Parallel Processing architecture

Page 7: introducing Parallel programming for accelerating Malware

Popular mobile GPUs

Parallel Processing architecture

7

Parallel Processing architecture

Adreno 330

-Inbuilt in Snapdragon™ 800 Series Processors. -speed can push to 3.6 gig pixels/s-Used in HTC one, Xperia z ultra

Power VR SGX544mp3

-Inbuild in Exynos 5 Octa processor-used in galaxy s4 @ 533 MHz clock speed

Mali T604

-First time used in Exynos 5 -The 1st Midgard architecture gpu for arm-Used in famous series of Google tablet nexus 10.

Page 8: introducing Parallel programming for accelerating Malware

GPU architecture

Parallel Processing architecture

8

Parallel Processing architecture

Page 9: introducing Parallel programming for accelerating Malware

Architecture

9

Page 10: introducing Parallel programming for accelerating Malware

Architecture Challenges

10

(1) Which type of dataset can we use in

order to have good detection accuracy?

Limitation of Mobile GPU memory VS important memory

requirement of DFA(3) The need of applying

memory compacting techniques

(2) How can we increase GPU processing performance?

Page 11: introducing Parallel programming for accelerating Malware

Challenge 1

11

Which type of dataset can we use in

order to have good

detection accuracy?

Page 12: introducing Parallel programming for accelerating Malware

Thread

1

Thread 2.1

Thread 3.1

Thread 3.2

Thread 2.2

Thread 3.4

Thread 1

Thread 2.1

Thread 3.1

Thread 3.2

Thread 4.1

Thread 2.2

Thread 3.4

Thread 4.2

Extraction of malicious behavior based on syscalls sequences with the thread-grained extraction technique

Malware detection

13

• Key concept: Malwares which have the same malicious code embedded on benign applications, will have common malicious behaviour

• Tracking the tree architecture of applications threads

• Malicious behaviour is likely to appear at the same thread level on applications having the same malware

Application 1 Application 2

Thread tree architecture of tow applications belongingTo the same malware family

Page 13: introducing Parallel programming for accelerating Malware

Thread 1

Thread 2.1

Thread 3.1

Thread 3.2

Thread 2.2

Thread 3.4

Thread 1

Thread 2.1

Thread 3.1

Thread 3.2

Thread 4.1

Thread 2.2

Thread 3.4

Thread 4.2

Training phase

Malware detection

14

Recording Phase

• Having different applications that belongs to the same malware family

• Execution of every application • Tracking thread tree structure of the

application • Recording syscalls sequences for

every thread created by the application

Extraction phase • Extraction of common syscalls

subsequences belonging to threads from the same family and having the same level

• Storage of common subsequences Application 1

Application 2

Page 14: introducing Parallel programming for accelerating Malware

Training phase

Malware detection

Filtering Phase

• Get syscall patterns from benign applications B

• For every Csi in our malware dataset M

• Counting the number of common subsequence Csi appearing in B and in M

• Calculate malicious probability of Csi • Storing Csi into Malware common subsequences if Csi haw high

malicious probability • We choose to work with Csi having malicious probability = 1 (Csi

appearing in M and not in B)

The result of traning phase= Malware behavioural dataset build of syscalls patterns

Page 15: introducing Parallel programming for accelerating Malware

Architecture

16

(1) Which type of dataset can we use ?

(2) How can we increase GPU processing performance?

Page 16: introducing Parallel programming for accelerating Malware

Challenge 2

17

How can we increase the

parallell processing

performance of pattern

matching on the GPU?

Page 17: introducing Parallel programming for accelerating Malware

Input data

•Signatures•Syscalls•Bytecode …

Pattern matchi

ng

Parallel Pattern Matching Algorithm

• Match of data streams by malware scanner against a large set of known signatures, using a pattern matching algorithm.

• Pattern matching algorithms analyze the data stream and compare it against a database of signatures to detect known malware.

• Fairly complex signature patterns composed of different-size strings, range constraints, and sometimes recursive forms.

Example • Aho-corasick • Wu-manber• Knuth-Morris-Pratt

Patterns Model

Malicious application

Benign application

18

Parallel Pattern Matching Algorithm

Page 18: introducing Parallel programming for accelerating Malware

Aho-Corasick

Parallel Pattern Matching Algorithm

• AC algorithm is based on a DFA

structure built from reference

patterns.

• The construction of automaton

is done in pre-processing

phase.

• The matching process is done

in processing phase.

• The automaton structure can

be essentially described by tow

tables: transition table and

failure state table.

19

Parallel Pattern Matching AlgorithmParallel Pattern Matching Algorithm

Page 19: introducing Parallel programming for accelerating Malware

Direct implementation of parallel pattern matching

Parallel Pattern Matching Algorithm

• Idea Input stream segmentation For every segment we associate a

thread Problem of boundary detection

• Possible solution Every thread check the pattern

presence on the edges. Each thread must scan for a

minimum length which is s almost equal to the segment length plus the longest pattern length of an AC state machine

20

Parallel Pattern Matching AlgorithmParallel Pattern Matching Algorithm

Page 20: introducing Parallel programming for accelerating Malware

Parallel Failureless Aho-Corasick

Lin, C. H., Liu, C. H., Chien, L. S., & Chang, S. C. (2013).Accelerating pattern matching using a novel parallel algorithm on gpus. Computers, IEEE Transactions on, 62(10), 1906-1916.

Parallel Pattern Matching Algorithm

• Gaols Increase pattern matching

computation throughput via parallelization.

resolve the throughput bottleneck caused by the overlapped computation.

• Idea Byte allocation per thread Failure transitions elimination The thread stops his work if no

valid transition is found.

21

Parallel Pattern Matching AlgorithmParallel Pattern Matching Algorithm

Page 21: introducing Parallel programming for accelerating Malware

• No boundary detection problems

• Shorter worst-case and average life times of threads than the straightforward implementation

• Most of threads have a high probability to end their work early

• Access and required memory are reduced

Parallel Failureless Aho-Corasick

Parallel Pattern Matching Algorithm

22

Parallel Pattern Matching AlgorithmParallel Pattern Matching Algorithm

Page 22: introducing Parallel programming for accelerating Malware

• Reducing the global memory transactions of the system

• Making benefit from memory architecture of GPU by

using constant memory and local memory

• Minimize transfers: Intermediate data can be allocated, operated on, and deallocated without ever copying them to host memory

• Group transfers: One large transfer much better than many small ones

Parallel Failureless Aho-Corasick

Parallel Pattern Matching Algorithm

Increase of the algorithm performance on GPU

23

Parallel Pattern Matching AlgorithmParallel Pattern Matching Algorithm

Page 23: introducing Parallel programming for accelerating Malware

Challenge 3

24

• Malwares grows continuously

• The number of signatures is increasing proportionally

Scaling problems for mobile anti-malwares due to:

• Limitation of Mobile GPU memory VS Important memory requirement for DFA structureThe need of applying

memory compacting techniques

Page 24: introducing Parallel programming for accelerating Malware

Eliminating failure transition

Eliminating Final states table by performing state reordering

Applying P3FSM technique

Memory optimization technique

25

1

2

3

Page 25: introducing Parallel programming for accelerating Malware

P3FSM: Portable Predictive Pattern Matching Finite State Machine

Memory optimization technique

26

Page 26: introducing Parallel programming for accelerating Malware

Parallel Pattern Matching Algorithm

27

Hardware

Mobile Phone• HTC one

GPU• Adreno 320

CPU• Qualcomm Snapdragon 600, quad-core CPU @

1.7GHz

Benchmark• 600 Malicious syscalls patterns

Parallel Pattern Matching AlgorithmExperimentations

Page 27: introducing Parallel programming for accelerating Malware

Thread per block resizing

28

Experimentations

Best throughput with 64x64 threads = 72Mb/s

16

32

64

12

4

01

02

03

04

05

06

07

08

0

Local work-group resizing

Work-group size

Th

rou

gh

pu

t (M

b/s

)

Page 28: introducing Parallel programming for accelerating Malware

Effective use of the different GPU memory types

29

Experimentations

On

ly g

lob

al m

em

ory

con

stant m

em

ory

+ sh

are

d m

em

ory

54

56

58

60

62

64

66

68

70

72

74

Different mobile GPU memory use

Th

rou

gh

pu

t Around 15% gain of

performance with the use of constant and shared memories

Applying shared memory to improve the latency of global memory accesses

Page 29: introducing Parallel programming for accelerating Malware

Acceleration

30

Experimentations

An acceleration of around 2.3x is obtained with the parallel processing on the mobile GPU over serial processing

The framework throughput is dominated by data transfers between the host /device which consist of 60% of the total processing time

Page 30: introducing Parallel programming for accelerating Malware

Memory requirement

31

Experimentations

Number of patterns

PFAC (KB) P3FSM (KB)

100 446 37

200 703 83

300 1086 197

600 2843 462

Storing DFA structure on the GPU is memory consuming especially that mobile GPU memory is small.

Difference in memory requirement between PFAC DFA and P3FSM. P3FSM that compacts the DFA structure by many times comparing to

standard PFAC DFA.

Page 31: introducing Parallel programming for accelerating Malware

32

Implementation of a parallel host-based anti-malware on mobile GPU using behavioral detection techniques

Series of optimizations to deal with the low memory problem of mobile devices and the ever-increasing computing and memory requirements of malware detection

Perspectives: Integrating a GPU monitor which tracks down the GPU

memory usage and allows the automaton adjustment in real-time to fit the reduced GPU memory 

 Use of mobile GPU clusters Working on malware dataset to achieve better detection

accuracy

Conclusion

Page 32: introducing Parallel programming for accelerating Malware

33

Thank you for your attention