Upload
dewei
View
49
Download
1
Tags:
Embed Size (px)
DESCRIPTION
AccessMiner Using System-Centric Models for Malware Protection. Andrea Lanzi, Davide Balzarotti , Christopher Kruegel, Mihai Christodorescu and Engin Kirda ACM CCS 2010 Oct. OUTLINE. Malware Detection System Call Data Collection Program-Centric Models and Detection - PowerPoint PPT Presentation
Citation preview
AccessMiner Using System-AccessMiner Using System-Centric Models for Malware Centric Models for Malware ProtectionProtection
Andrea Lanzi, Davide Balzarotti , Christopher Kruegel, Mihai Christodorescu
and Engin Kirda ACM CCS 2010 Oct.
1
OUTLINEOUTLINEMalware DetectionSystem Call Data CollectionProgram-Centric Models and DetectionSystem-Centric Models and DetectionDiscussion and Conclusion
2
OUTLINEOUTLINEMalware DetectionSystem Call Data CollectionProgram-Centric Models and DetectionSystem-Centric Models and DetectionDiscussion and Conclusion
3
Malware DetectionMalware DetectionSignature
◦ Static content◦ Byte strings, instruction sequences=>Code obfuscation
Behavior◦ Dynamic actions◦ Sequences of System calls, API functions◦ A program-centric approach◦ …good results?
4
Malware Detection Malware Detection ProblemProblemTest case
◦ Small scale About 10 benign applications
◦ Limited execution A few minutes, sandbox
◦ Synthetic inputs◦ Single machine
5
Malware Detection Malware Detection Problem(cont.)Problem(cont.)Program-centric model
◦ Narrow view on a program◦ Diversity of system call information◦ How benign programs interact with their
environment?◦ Their models may specific to a small set of
benign applications only
6
OUTLINEOUTLINEMalware DetectionSystem Call Data CollectionProgram-Centric Models and DetectionSystem-Centric Models and DetectionDiscussion and Conclusion
7
System Call Data System Call Data CollectionCollectionA Microsoft Windows kernel module
◦ Collect, anonymize, and upload system call logs
◦ Hooks the System Services Descriptor Table
◦ Mindful of system resource
8
Kernel collectorKernel collector79 different system calls
◦ Related to files, regs, processes and threads, networking, memory.
◦ Same subset in Anubis
<timestamp, program, pid, ppid, system call, args, result>
9
System Call DataSystem Call Data
Sensitive data are replaced◦ Non-system paths, user-root registry key,
IP addresses
10
System Call Data System Call Data CollectionCollectionLarge and diverse set of system call
traces◦ Ten different machines, different users◦ Serveral weeks◦ 114.5GB of data◦ 1.556 billion system call◦ 362,600 processes◦ 242 applications
11
Data setData set
2~4 days with 2~12 hoursProduction systems, development systems
12
Data NormalizationData NormalizationRaw data(system call logs)
=>Accessed resources and access type
Tracking the access operations◦ The set of resources open at any given time
OS handles
◦ Until the resource is released(NtClose)
Execution path and file name:◦ NtOpenFile, NtCreateSection, NtCreateThread
13
OUTLINEOUTLINEMalware DetectionSystem Call Data CollectionProgram-Centric Models and DetectionSystem-Centric Models and DetectionDiscussion and Conclusion
14
Analysis of System Call Analysis of System Call DataDataHow diverse is the collected system
call data?
Focus on types◦ Long tradition in the security community◦ Most models rely upon characteristic
patterns
Ignore argument values
15
Creating Creating n-gramn-gram Models ModelsFollow a ”standard” approach
1.Extract n-grams Models for a set of malware programs and a set of benign programs
2.Find all n-grams appear in malware programs but not in benign programs
3.Hope those n-grams are characteristic for malware programs
16
Unique Unique n-gramn-gram analysis analysis
17
n-gram n-gram ModelsModels10,838 malware samples from Anubis
Ten experiments(ten machines)◦ System call traces from 9 machines and
2/3 of the malware set to train an n-grams◦ Perform detection with remaining system
calls traces and 1/3 malwares
18
Detection ResultsDetection Results
19
Program-CentricProgram-Centric Models and Models and DetectionDetectionSince system-call sequences invoked
by benign applications are diverse◦ Have difficulties in distingushing normal and
malicious behaviors
A large amount of data is needed
20
OUTLINEOUTLINEMalware DetectionSystem Call Data CollectionProgram-Centric Models and DetectionSystem-Centric Models and DetectionDiscussion and Conclusion
21
System-CentricSystem-Centric Models and Models and DetectionDetectionGeneralize how benign programs
interact with the operating systemRecord the files and the registry
entries◦ Read, write, execute
It is “convergence”
22
Access Activity ModelAccess Activity ModelA set of labels for operating system
resources
A label “L” is a set of access tokens◦ {t0,t1,…,tn}
A token “t” is a pair <a,op>◦ <firefox,write>, <*,execute>
a => applicationop => type of access
23
Initial Access Activity Initial Access Activity Model(1)Model(1)Use system-call traces of all benign
processes
A virtual file system treeApplication “a”C:\foo\a.txt (write)Application “b”C:\foo\bar\b.rar (exec)
24
Model Pre-processing(2)Model Pre-processing(2)Remove some elements in the tree
◦ Microsoft Windows services◦ Desktop indexing programs◦ Anti-virus software
Identify applications that start processes with different names◦ C:\Windows\system32 => win_core
25
Model Generalization(3)Model Generalization(3)PropagatedContainer
◦ All children are private(without *)◦ C:\Program Files
Merged<x.write> => <x.read>
26
System-CentricSystem-Centric Model Model DetectionDetectionFor any opFind the longest prefix P shared between
the path to the resource and the folders in the virtual tree stored by our model
Ten experiments◦ File system access activity model
About 100 labels
◦ Registry access activity model About 3000 labels
◦ Full access activity model
27
Detection Results(Files)Detection Results(Files)
//Looks soberingMany samples(Malware) don’t work(!)
◦ 10,838 -> 7,847Use only write operation
◦ Our own logging component◦ Software updates
28
Detection Results(Regs)Detection Results(Regs)
29
HKEY_USER\Software\Microsoft◦Need a larger training set
OUTLINEOUTLINEMalware DetectionSystem Call Data CollectionProgram-Centric Models and DetectionSystem-Centric Models and DetectionDiscussion and Conclusion
30
Discussion and ConclusionDiscussion and ConclusionFull access activity model
◦ 91% detection / 0% false positivesSystem-centric approachPolicy violations occurred only for few,
specific classes of programsNetwork limitation MAC policy
◦ SELinux
31