Upload
igor-valdez
View
18
Download
3
Embed Size (px)
DESCRIPTION
peHash : A novel approach to Fast Malware Clustering. By : Georg Wicherski Presenting: Rasika Bindoo. Introduction. Data collection not a problem anymore because of honeypots. Honeypots suffer from a drawback of polluting malware databases. Anti-Viruses are slow. - PowerPoint PPT Presentation
Citation preview
peHash: A novel approach to Fast Malware Clustering
By: Georg WicherskiPresenting: Rasika Bindoo
IntroductionData collection not a problem anymore
because of honeypots.Honeypots suffer from a drawback of
polluting malware databases.Anti-Viruses are slow.Thus development of peHash for clustering
group instances of the same polymorphic instances.
Other Attempts at HashingSpamsum, mrshash n-grams Signatures Vx-Class
peHash Function DesignThe function should have the following design
characteristicsIt should not have the need to look into the
contents of the sections.Low computational complexity.Scaling the result of the bzip2 compression
ratio to [0…7] С N leads to best matches.
Structural propertiesThe polymorphic malware share the same
structural Portable Executable properties.Thus following properties are taken into
account for distinction between binaries : Image characteristics. Subsystem. Stack commit size. Heap commit size.
Structural propertiesStructural information used for each section
in the Portable Executable. Virtual address Raw size Section Characteristics
Generation of hash valueshash[0] := characteristics[0…7]
V characteristics[8…15]hash[1] := subsystem[0…7]
V subsystem [8…15]hash[2] := stackcommit[0…7]
V stackcommit[8…15] V stackcommit[24…31]
hash[3] := heapcommit[0…7] V heapcommit [8…15] V heapcommit[24…31]
‘V’ symbolizes XOR operation
Generation of hash valuesSub-hash
shash[0] := virtaddress-9…31]shash[2] := rawsize[8…31]shash[4] := characteristics[16…23]
V characteristics[24…31]shash[5] := kolmogorov ϵ [0…7] С N
Advantages of this hash functionComplexity is O(1).SHA1 of the hash buffer is calculated to
obtain the final hash value. Thus difficult to create collisions.Constant length hashes are generated in
spite of variable number of sections in the executables.
Entry Points and ImportsThe value of entry point can be easily
changed for each instance of polymorphic specimen.
Most packers specify misleading Import Address Tables.
The import information can also be easily changed without any noteworthy efforts and hence not included in the hash function.
Thus both entry point information and imports are not included in hash function.
EvaluationCluster Size
MwcollectAlliance
Arbor Networks
1 7109 16543
2-9 3165 4104
10-99 549 611
100-499 70 71
500-999 19 4
1000-4999 18 8
5000+ 7 2
• peHash helps in clustering of polymorphic malware and also helps in detecting broken copies of already known threats.
EvaluationFile MD5 Size
diantz.exe48734e9b45dca36e8a…
85504
makecab.exe2740dc2fbefaddb891f…
85504
find.exe09b4e22c86f7e9f1e5…
9216
print.exe76b96ed5304319f208…
9216
subst.exe77847ef3cec784b137…
9216
bootvrfy.exec2ab77d9dc66447dc1…
5120
comrereg.exe908f0eda6a49625f98…
5120
dcomcnfg.exe1178cd20b90936837d…
5120
• Files in broken cluster share same size.
• Differentiation can be done only by looking at actual code or imports. Hence not possible for peHash.
Performance • Analysis to be carried out for one sample per peHash cluster.
• Performance is not related to binary size or section count.
ConclusionpeHash provides a performant solution to the
problem of seemingly new malware samples.peHash can accomplish correct clustering for
large sets by using basic information from Portable Executables.
peHash cannot be used to cluster variants of malware families for which code structure has to be analyzed.
Thank You