peHash : A novel approach to Fast Malware Clustering

Preview:

DESCRIPTION

peHash : A novel approach to Fast Malware Clustering. By : Georg Wicherski Presenting: Rasika Bindoo. Introduction. Data collection not a problem anymore because of honeypots. Honeypots suffer from a drawback of polluting malware databases. Anti-Viruses are slow. - PowerPoint PPT Presentation

Citation preview

peHash: A novel approach to Fast Malware Clustering

By: Georg WicherskiPresenting: Rasika Bindoo

IntroductionData collection not a problem anymore

because of honeypots.Honeypots suffer from a drawback of

polluting malware databases.Anti-Viruses are slow.Thus development of peHash for clustering

group instances of the same polymorphic instances.

Other Attempts at HashingSpamsum, mrshash n-grams Signatures Vx-Class

peHash Function DesignThe function should have the following design

characteristicsIt should not have the need to look into the

contents of the sections.Low computational complexity.Scaling the result of the bzip2 compression

ratio to [0…7] С N leads to best matches.

Structural propertiesThe polymorphic malware share the same

structural Portable Executable properties.Thus following properties are taken into

account for distinction between binaries : Image characteristics. Subsystem. Stack commit size. Heap commit size.

Structural propertiesStructural information used for each section

in the Portable Executable. Virtual address Raw size Section Characteristics

Generation of hash valueshash[0] := characteristics[0…7]

V characteristics[8…15]hash[1] := subsystem[0…7]

V subsystem [8…15]hash[2] := stackcommit[0…7]

V stackcommit[8…15] V stackcommit[24…31]

hash[3] := heapcommit[0…7] V heapcommit [8…15] V heapcommit[24…31]

‘V’ symbolizes XOR operation

Generation of hash valuesSub-hash

shash[0] := virtaddress-9…31]shash[2] := rawsize[8…31]shash[4] := characteristics[16…23]

V characteristics[24…31]shash[5] := kolmogorov ϵ [0…7] С N

Advantages of this hash functionComplexity is O(1).SHA1 of the hash buffer is calculated to

obtain the final hash value. Thus difficult to create collisions.Constant length hashes are generated in

spite of variable number of sections in the executables.

Entry Points and ImportsThe value of entry point can be easily

changed for each instance of polymorphic specimen.

Most packers specify misleading Import Address Tables.

The import information can also be easily changed without any noteworthy efforts and hence not included in the hash function.

Thus both entry point information and imports are not included in hash function.

EvaluationCluster Size

MwcollectAlliance

Arbor Networks

1 7109 16543

2-9 3165 4104

10-99 549 611

100-499 70 71

500-999 19 4

1000-4999 18 8

5000+ 7 2

• peHash helps in clustering of polymorphic malware and also helps in detecting broken copies of already known threats.

EvaluationFile MD5 Size

diantz.exe48734e9b45dca36e8a…

85504

makecab.exe2740dc2fbefaddb891f…

85504

find.exe09b4e22c86f7e9f1e5…

9216

print.exe76b96ed5304319f208…

9216

subst.exe77847ef3cec784b137…

9216

bootvrfy.exec2ab77d9dc66447dc1…

5120

comrereg.exe908f0eda6a49625f98…

5120

dcomcnfg.exe1178cd20b90936837d…

5120

• Files in broken cluster share same size.

• Differentiation can be done only by looking at actual code or imports. Hence not possible for peHash.

Performance • Analysis to be carried out for one sample per peHash cluster.

• Performance is not related to binary size or section count.

ConclusionpeHash provides a performant solution to the

problem of seemingly new malware samples.peHash can accomplish correct clustering for

large sets by using basic information from Portable Executables.

peHash cannot be used to cluster variants of malware families for which code structure has to be analyzed.

Thank You

Recommended