Virus-AntiVirus Co-evolution - DTC

Preview:

Citation preview

2006 Symantec Corporation, All Rights Reserved

AnonymizingAnonymizing FilesystemFilesystem Metadata Metadata for Analysisfor Analysis

Chris Xin

Symantec

Challenges of Filesystem Analysis

Real-time live-system monitoring is difficult.– performance degradation– security & privacy concerns– stability risk

Traces– difficult to reconstruct I/O dependencies– system states– security & privacy concerns

Benchmarks– “There are lies, damn lies and then there are benchmarks.”

Filesystem images– snapshot, backups– security & privacy concerns

Agenda

Challenges of filesystem analysis

Keeping filesystem images– metasave

Metadata anonymization– secure metasave

Measurement– space efficiency– time efficiency– resource consumption

Summary

Filesystem Images

Storing the whole system would be expensive.– large storage space– long time

Keeping metadata is a wise idea.– A good resource for understanding some characteristics of a file

system– Cumulative images can be obtained to track the change trend of a file

systemfile size, age, type informationfilesystem aging analysis

– Address some privacy concerns by eliminating user data

Some file systems already provide such a utility.– Ext2: e2image– Linux NTFS: ntfsclone --metadata– VxFS: metasave

Metasave Utility

The utility saves or restores the metadata of VxFS– Available in version 1 and later versions.– Metadata is kept in a way that the original geometry of a file system

is preserved and all the inode information is intact.– No user data is retained.– Metadata can be saved on top of a snapshot, a backup, or a live

system as an image file.– The image file can be deflated and metadata can be restored back to

a file or a device.

What do we do with images?– troubleshooting– debugging– file system analysis

Efficient Anonymization

But …your clients may say no …– Sensitive information is still in the file and directory names– Concerns of performance degradation

Solution: Anonymize clients’ information in metadata– Names of files and directories– Client information in file system intent logs

Requirements– Must be difficult to recover original information– Keep the geometry of the file system: retain the length of the

file/directory names– Time efficient– Space efficient– Minimum performance degradation

Secure Metasave

Enhanced metasave with encryption options– Evolved from metasave, a VxFS utility for saving/restoring

metadata of a file system– Online image saving– Use cryptographic message digest algorithm to obfuscate

client informationThe algorithm can be chosen by a client’s requirementDefault: SHA-1

Message Digest

Secure one-way hash function: e=H(M)– M: original message– H: hash function– e: digested message

Key properties– Given M, easy to compute e=H(M) – Given e, hard to compute M such that e=H(M)– Given M, hard to find M' (different from M) such that

H(M)=H(M') (minimum collision)

Implementation

OpenSSL libraryObfuscate a file/directory name

– Do it by individual pathname components/a/bc/bcd /x/rd/wyz

– Retain name lengthDigest works on a fixed length of characters at a time.

– 20 characters for SHA-1If len(name) > len(digest), process it in segments.If len(name) < len(digest) or len(final segment) < len(digest), digest the name string and remove some characters to preserve its original length.Digest can contain characters that are illegal in file/directorynames; map them to legal characters.

File/Directory Name Manipulation

Parse a name stringMessage digestChop it to its original length

Random number generator with a changeable seed

Character mapping

790

digests

0 67

original name string

20 6040

0 67

obfuscated filename

0

chop to org. length

67

Obfuscation Options

Full-name obfuscation

Retain file extension if any

Obfuscate extensions as well and make them consistent

original nameobfuscation option

foo1.c foo2.c

full-name abcde uwxyz

retain file extension jkis.c swdx.c

consistent extension jkis.x swdx.x

Further Handling

Multiple extensions and prefixes for name-only obfuscation option– Look at the last extension only

foo.c.bak abced.bak– retain extension of 4 or less; obfuscate anything bigger

Do not obfuscate the name of special administrative files or directories– lost+found

Rebuild directory indexes and block checksums after name obfuscationSymlinks

– Point to the same place within the file system– “..” is kept intact

Intent logs– Offers an option to not include intent logs in an image file.– If intent log is retained, file and directory names are obfuscated.

Collision Probability

What’s a collision?– Two files/directories with different names, say A and B, end up with

the same name after obfuscation.

Do we have to worry about it?– Not really– Collision only matters within individual directories.– Chance of collision is tiny

With SHA-1, 1 in 1024 possibility for a filesystem with a trillion file/directory names, and 1 in 1018 for quadrillion names.The character mapping and name length chopping increase the chance of collisions slightly.

– An optional name conflict check is followed after obfuscation for a file system with large directories.

Measurement

Three categories– Space consumption– Time consumption

encryption overhead– Resource consumption

Six filesystems measured– four customer filesystems– two filesystems on our production server (fs #2 and #6)

Experiment environment– Live production system

Sun Fire E690016 Sparc CPUs, 32GB memory, shared disks

– Test machineSun Fire V2402 Sparc CPUs, 2GB memory, single-user disks

Space Efficiency

The image of metadata usually takes about 1-5% of the filesystemsize.

storage efficiency

0.08 0.06 0.04 0.05

6.88

0.600.12 0.08

0.73 0.56

11.73

0.63

0

2

4

6

8

10

12

1 2 3 4 5 6

filesystem

% o

f im

age

over

fs s

ize

% of total cap.% of used cap.

Time Efficiency

How long does it take to get an anonymized file system image?– use “filename-only” option– on the live production system

about 30 minutes to get an encrypted metadata image from fs #6.5--8 secs for fs #2.

– on the test machine:time efficiency

1.9 1.7 0.267 6.4

108.33

273

0

50

100

150

200

250

300

1 2 3 4 5 6

filesystem

time

(sec

)

A closer look

The factors in play– # of inodes– total filesystem size– filesystem capacity

fs # files time (sec)

production

msv size/

used fs cap.

39 --

4

--

--

--

1836

742

0.12%

0.08%

0.73%

0.56%

11.73%

3,721

59,584

956,180

2,259,443 0.63%

time (sec)

test

msv size/

total fs cap.

total(GB) used(GB)

1.9

1.7

0.267

6.4

108.33

273.0

27.80.08%

0.06%

0.04%

0.05%

6.88%

49.5

9.0

150.0

3.9

0.60% 195.4

1 18.3

2 39.4

3 0.6

4 12.4

5 2.3

6 186.9

Encryption Overhead

Space efficiency is the same.

time efficiency– Little overhead introduced on a live production system

I/O boundedshared disk

– Noticeable computational overhead on the test machine.

Encryption Overhead on the Test Machine

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1 2 3 4 5 6

file system

norm

aliz

ed ti

me

no-encryptionfull-obfuscationfilename-onlyconsistent-extension

Encryption Overhead on the Production System

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1 2 3 4 5 6

file system

norm

aliz

ed ti

me

no-encryptionfull-obfuscationfilename-onlyconsistent-extension

Resource Consumption

Not much performance degradation during image saving

– 20 MB memory and 1% of CPU were utilized during the image dumping on a live production system.

Summary

A method of anonymizing filesystem metadata.– Obfuscate clients information to relieve privacy concerns– Cost 1-5% storage of the original file system size.– Fairly quick process and little performance degradation.

We encourage saving file metadata images with anonymization.

– Provide a good resource for file system analysis– Benefit both development and research

The anonymization scheme can be used in other file system utilities, such as trace collecting.

References

Bruce Schneier, Applied Cryptography. Second Edition, J. Wiley and Sons, 1996

Mark Ryan, “One-way secure hash functions”, Computer Security lecture notes, University of Birmingham.

Geoff Kuenning and Ethan L. Miller, "Anonymization Techniques for URLs and Filenames," Technical Report UCSC-CRL-03-05, University of California, Santa Cruz, September 2003.

Xiaoyun Wang, Yiqun Lisa Yin and Hongbo Yu, “Finding Collisions in the Full SHA-1”, CRYPTO 2005

http://www.linux-ntfs.org/

Acknowledgements

Thanks to Oleg Kiselev, John Colgrove, Craig Harmer, Chuck Silvers and George Mathew for discussions.

Thanks to Marianne Lent and Paul Massiglia for suggestions.

Thanks to Ken Zachmann for helping with experiments.

Questions