47
Getting Fuzzy With It Jesse Kornblum

Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Getting Fuzzy With It

Jesse Kornblum

Page 2: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

2

Outline •  Introduction •  Identical •  Similarity •  Generic Data •  Fuzzy Hashing •  Similar Text •  Similar Pictures •  Similar Programs •  Questions

Page 3: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

3

Introduction •  U.S. Air Force Office of Special Investigations •  U.S. Department of Justice

•  Applied Computer Forensics Research

•  Tools I've written –  Memory forensics –  md5deep/hashdeep –  fuzzy hashing (ssdeep) –  Foremost

Page 4: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Motivation

4

Page 5: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Identical •  A == B •  Difficult for humans •  Easy for computers

•  Requires storing the original A and B –  Big files –  Could be illegal or private content

5

Page 6: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Identical •  Cryptographic Hashing shortcut •  SHA-3 and friends •  If SHA-3(A) == SHA-3(B) then A == B*

–  * to within a high degree of certainty –  Chance of random collision is very very very very small

•  Hashes signatures are small •  Impossible to recover input from signature

6

Page 7: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Identical Data •  Cryptographic

hashes are spoiled by even a single byte difference in the input

•  Very similar things have wildly different cryptographic hashes

7

Image courtesy of Flickr user krystalchu and used under Create Commons license, http://www.flickr.com/photos/krystalclear/3189597813/

Page 8: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Similarity •  What does it mean for two things to be similar?

8

Page 9: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Similarity •  Depends on:

–  The kind of things be compared –  How they’re being compared

9

Page 10: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Example

10

Page 11: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Example •  Both live in Washington DC •  Both like a good hamburger •  Both like dogs

•  Conclusion: Similar

•  President Obama is 7 cm taller •  Jesse does not have gray hair •  Work in different career fields

•  Conclusion: Not similar

11

Page 12: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Generic Data •  Thing being compared:

–  Ones and zeros of the data •  How being compared:

–  Files with identical runs of ones and zeros

12

Page 13: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Disclaimer

•  I didn’t invent this math •  Originally Dr. Andrew Tridgell

–  Samba –  rsync was part of his thesis –  Modified slightly for spamsum

•  Spam detector in his “junk code” folder

Page 14: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Fuzzy Hashing •  Combination of a rolling hash and traditional hash •  Rolling hash looks only at last few bytes

F o u r s c o r e -> 83,742,221 F o u r s c o r e -> 5 F o u r s c o r e -> 90,281 •  When processing a file, compute block size using file size •  If rolling hash equals random magic value, it’s a trigger point

14

Page 15: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Fuzzy Hashing •  Compute traditional hash while processing file •  On each trigger point, record value •  Reset traditional hash and continue

•  Example –  Excerpt from "The Raven" by Edgar Allan Poe –  Triggers on ood and ore

15

Page 16: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Fuzzy Hashing Deep into the darkness peering, long I stood there, wondering, fearing Doubting, dreaming dreams no mortals ever dared to dream before; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word, Lenore?, This I whispered, and an echo murmured back the word, "Lenore!" Merely this, and nothing more

Page 17: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Fuzzy Hashing Deep into the darkness peering, long I stood there, wondering, fearing Doubting, dreaming dreams no mortals ever dared to dream before; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word, Lenore?, This I whispered, and an echo murmured back the word, "Lenore!" Merely this, and nothing more

Page 18: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Fuzzy Hashing Deep into the darkness peering, long I stood there, wondering, fearing Doubting, dreaming dreams no mortals ever

dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word, Lenore ?, This I whispered, and an echo murmured back the word, "Lenore !" Merely this, and nothing more

Page 19: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Fuzzy Hashing Deep into the darkness peering, long I stood there, wondering, fearing Doubting, dreaming dreams no mortals ever

dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word, Lenore ?, This I whispered, and an echo murmured back the word, "Lenore !" Merely this, and nothing more

28163

491522

57

145410213

738210

Page 20: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Fuzzy Hashing Deep into the darkness peering, long I stood there, wondering, fearing Doubting, dreaming dreams no mortals ever

dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word, Lenore ?, This I whispered, and an echo murmured back the word, "Lenore !" Merely this, and nothing more

28163

491522

57

145410213

738210

Signature = 32730

Page 21: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Fuzzy Hashing Deep into the darkness peering, long I stood there, wondering, fearing Doubting, dreaming dreams no mortals ever

dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word, Lenore ?, This I whispered, and an echo murmured back the word, "Lenore !" Merely this, and nothing more

Page 22: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Fuzzy Hashing Deep into the darkness peering, long I stood there, wondering, I AM THE LIZARD KING!!!1! fearing Doubting,

dreaming dreams no mortals ever dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word, Lenore ?, This I whispered, and an echo murmured back the word, "Lenore !" Merely this, and nothing more

Page 23: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Fuzzy Hashing Deep into the darkness peering, long I stood there, wondering, I AM THE LIZARD KING!!!1! fearing Doubting,

dreaming dreams no mortals ever dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word, Lenore ?, This I whispered, and an echo murmured back the word, "Lenore !" Merely this, and nothing more

28163

381739

57

145410213

738210

Page 24: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Fuzzy Hashing Deep into the darkness peering, long I stood there, wondering, I AM THE LIZARD KING!!!1! fearing Doubting,

dreaming dreams no mortals ever dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word, Lenore ?, This I whispered, and an echo murmured back the word, "Lenore !" Merely this, and nothing more

28163

381739

57

145410213

738210 Original Signature = 32730 New Signatures = 39730

Page 25: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Fuzzy Hashing •  Comparing Signatures •  Edit Distance

–  Number of changes to turn one into the other

–  32730 –  39730 –  Edit distance = 1

•  If edit distance is small relative to length, fuzzy hash match

25

Page 26: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Fuzzy Hashing •  Data format agnostic

–  Doesn't consider the type of data being processed •  Gets confused easily

–  Identifies surrounding format rather than content –  Many small changes throughout input

26

Page 27: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

sdhash •  Another algorithm for matching similar files •  Developed by Dr. Vassil Roussev, University of New Orleans

•  Like ssdeep, ignores data format •  Variable sized hashes

–  About 3% of input size •  Handles data reordering

•  Matches more files than ssdeep –  This is not, necessarily, “better” –  But usually is better

•  Code, paper, roadmap: –  http://sdhash.org/

27

Page 28: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Similar Text •  What makes two text documents similar?

28

Page 29: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Similar Text •  Documents which contain identical words •  Documents which contain identical phrases

–  N-grams

•  Reduce the data •  Stop words

–  Language dependent –  A, the, I, of, and, for

•  Porter Stemming –  Reduce words to a root –  Example: dog dogs dogged dogging

•  dog

29

Page 30: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Similar Text •  Example:

•  Input 1: •  the dog run was small

–  dog run wa small

•  Input 2: •  running with dogged pursuit

–  run with dog pursuit

•  These are documents which contain identical words

30

Page 31: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Similar Text •  Synonyms •  Reduce words to a base synonym •  Example:

–  darkness, dimness, dusk, gloom, nightfall, shade, shadows

•  "Deep into the darkness peering" –  Deep into the $DARK peer

•  "The gloom that breathes upon me with these airs" –  The $DARK that breath upon me with these air

31

Page 32: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

sdtext •  Tool to identify similar text

–  Professor Clay Shields at Georgetown University –  Formerly GUCS –  http://sdtext.com/

•  Identifies documents with identical words •  Which words? •  Select words which are unique, but not too unique

–  Avoid stop words –  Avoid words which only appear in one document

•  Bit-vector of words •  Compare with cosine similarity

32

Page 33: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Similar Pictures

33

WARNING: EXPLICIT IMAGERY

LAW ENFORCEMENT SENSITIVE - DO NOT DUPLICATE

Page 34: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Similar Pictures

34

Page 35: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Similar Pictures

35

kitty1.jpg kitty1.png

$ md5deep -b kitty1* 1dcfeb21e006836bb4a999e07ad54571 kitty1.jpg 12de826dce2567c85af22750644bc9ed kitty1.png $ ssdeep -b -a -d kitty1* kitty1.png matches kitty1.jpg (0)

Page 36: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Similar Pictures •  Just like generic data and text •  Simplify the data •  Compare the simplified versions

•  Example from Dr. Neal Krawetz –  http://bit.ly/kuTKXR

•  Reduce picture size •  Reduce number of colors •  Compute average pixel value •  Record whether each pixel is above or below average •  Create hash based on bits from the last step

36

Page 37: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Similar Pictures

37

268407468280 268407468280

•  Hamming distance is zero •  Images are similar!

Page 38: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Perceptual Hashing

38

Page 39: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Perceptual Hashing

39

268407468280 259820151032

•  Small Hamming distance, but not zero •  Number of matches to any image depends on match threshold

Page 40: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Perceptual Hashing Tools •  Several exist and are free to law enforcement

–  Plug in to major forensics programs –  Have signatures you can share

•  Free and open source library –  pHash, http://phash.org/

•  This technique works for audio and video too

40

Page 41: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Similar Programs •  What makes computer programs similar?

41

Page 42: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Similar Programs •  Programs which

–  Do the same thing –  Process the same kind of files –  Have the same look and feel –  Written by the same person –  Connect to the same Command and Control server

42

Page 43: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Features •  What features can we extract from a program?

43

Page 44: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Features •  Signed code? •  Which APIs are called •  How often APIs are called •  Order in which APIs are called •  Entropy •  DLLs used •  Percentage of code coverage •  Magic strings •  N-grams of instructions •  Control-flow graph •  IP addresses accessed •  …

44

Image courtesy of Flickr user doctor_keats and used under Create Commons license.

Page 45: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Similar Programs •  Programs aren't static

•  They can be different every time –  Depend on user interaction

•  No single best way to compare them

45

Page 46: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

46

Outline •  Introduction •  Identical •  Similarity •  Generic Data •  Fuzzy Hashing •  Similar Text •  Similar Pictures •  Similar Programs •  Questions

Page 47: Getting Fuzzy With It - Jesse Kornblumjessekornblum.com/presentations/htcia12.pdfFuzzy Hashing • Combination of a rolling hash and traditional hash • Rolling hash looks only at

Questions?

Jesse Kornblum [email protected]

47