15
peHash: A novel approach to Fast Malware Clustering By: Georg Wicherski Presenting: Rasika Bindoo

peHash : A novel approach to Fast Malware Clustering

Embed Size (px)

DESCRIPTION

peHash : A novel approach to Fast Malware Clustering. By : Georg Wicherski Presenting: Rasika Bindoo. Introduction. Data collection not a problem anymore because of honeypots. Honeypots suffer from a drawback of polluting malware databases. Anti-Viruses are slow. - PowerPoint PPT Presentation

Citation preview

Page 1: peHash : A novel approach to Fast Malware Clustering

peHash: A novel approach to Fast Malware Clustering

By: Georg WicherskiPresenting: Rasika Bindoo

Page 2: peHash : A novel approach to Fast Malware Clustering

IntroductionData collection not a problem anymore

because of honeypots.Honeypots suffer from a drawback of

polluting malware databases.Anti-Viruses are slow.Thus development of peHash for clustering

group instances of the same polymorphic instances.

Page 3: peHash : A novel approach to Fast Malware Clustering

Other Attempts at HashingSpamsum, mrshash n-grams Signatures Vx-Class

Page 4: peHash : A novel approach to Fast Malware Clustering

peHash Function DesignThe function should have the following design

characteristicsIt should not have the need to look into the

contents of the sections.Low computational complexity.Scaling the result of the bzip2 compression

ratio to [0…7] С N leads to best matches.

Page 5: peHash : A novel approach to Fast Malware Clustering

Structural propertiesThe polymorphic malware share the same

structural Portable Executable properties.Thus following properties are taken into

account for distinction between binaries : Image characteristics. Subsystem. Stack commit size. Heap commit size.

Page 6: peHash : A novel approach to Fast Malware Clustering

Structural propertiesStructural information used for each section

in the Portable Executable. Virtual address Raw size Section Characteristics

Page 7: peHash : A novel approach to Fast Malware Clustering

Generation of hash valueshash[0] := characteristics[0…7]

V characteristics[8…15]hash[1] := subsystem[0…7]

V subsystem [8…15]hash[2] := stackcommit[0…7]

V stackcommit[8…15] V stackcommit[24…31]

hash[3] := heapcommit[0…7] V heapcommit [8…15] V heapcommit[24…31]

‘V’ symbolizes XOR operation

Page 8: peHash : A novel approach to Fast Malware Clustering

Generation of hash valuesSub-hash

shash[0] := virtaddress-9…31]shash[2] := rawsize[8…31]shash[4] := characteristics[16…23]

V characteristics[24…31]shash[5] := kolmogorov ϵ [0…7] С N

Page 9: peHash : A novel approach to Fast Malware Clustering

Advantages of this hash functionComplexity is O(1).SHA1 of the hash buffer is calculated to

obtain the final hash value. Thus difficult to create collisions.Constant length hashes are generated in

spite of variable number of sections in the executables.

Page 10: peHash : A novel approach to Fast Malware Clustering

Entry Points and ImportsThe value of entry point can be easily

changed for each instance of polymorphic specimen.

Most packers specify misleading Import Address Tables.

The import information can also be easily changed without any noteworthy efforts and hence not included in the hash function.

Thus both entry point information and imports are not included in hash function.

Page 11: peHash : A novel approach to Fast Malware Clustering

EvaluationCluster Size

MwcollectAlliance

Arbor Networks

1 7109 16543

2-9 3165 4104

10-99 549 611

100-499 70 71

500-999 19 4

1000-4999 18 8

5000+ 7 2

• peHash helps in clustering of polymorphic malware and also helps in detecting broken copies of already known threats.

Page 12: peHash : A novel approach to Fast Malware Clustering

EvaluationFile MD5 Size

diantz.exe48734e9b45dca36e8a…

85504

makecab.exe2740dc2fbefaddb891f…

85504

find.exe09b4e22c86f7e9f1e5…

9216

print.exe76b96ed5304319f208…

9216

subst.exe77847ef3cec784b137…

9216

bootvrfy.exec2ab77d9dc66447dc1…

5120

comrereg.exe908f0eda6a49625f98…

5120

dcomcnfg.exe1178cd20b90936837d…

5120

• Files in broken cluster share same size.

• Differentiation can be done only by looking at actual code or imports. Hence not possible for peHash.

Page 13: peHash : A novel approach to Fast Malware Clustering

Performance • Analysis to be carried out for one sample per peHash cluster.

• Performance is not related to binary size or section count.

Page 14: peHash : A novel approach to Fast Malware Clustering

ConclusionpeHash provides a performant solution to the

problem of seemingly new malware samples.peHash can accomplish correct clustering for

large sets by using basic information from Portable Executables.

peHash cannot be used to cluster variants of malware families for which code structure has to be analyzed.

Page 15: peHash : A novel approach to Fast Malware Clustering

Thank You