The Open Source Forensics Stack - Simson Garfinkelsimson.net/ref/2010/2010-07-27 ECC 2 -...

Preview:

Citation preview

The Open Source Forensics Stacklibewf, afflib, SleuthKit for digital Forensics Educators

Simson L. GarfinkelAssociate Professor, Naval Postgraduate School

1

Discuss disk image format issues.

Introduce open source stack: libewf, afflib, tsk,

Introduce Digital Forensics XML fiwalk fiwalk.py

Promote Tools that are available now! frag_find bulk_extractor

Goals of this talk

2

<fileobject>

Bit-level layer — dictates how data is stored

Schema Layer — structure of stored data

API Layer — interface to analysis programs.

Forensic Modalities;Proprietary vs. Open Source

3

Disk Images Disk image files (MB to GB in size)

Packet Capture Files libpcap files

Memory Images raw files; debug files Hibernation Files

File Signatures Lists of hash codes (typically MD5)

4

There are many kinds of forensic data

Formats: RAW (dd) —

—easiest format to work with; fast; very big—Handled by all tools—Many file systems (FAT32, ext2), cannot have files larger than 4GB

Split raw (file.000, file.001, file.002, etc.)

Encase (.E01) — compressed format developed by Expert Witness / Guidance Software—Compressed —Splits files across multiple volumes (file.E01, file.E02, etc.)—Doesn't work with some tools (carvers, etc.)—Supports "passwords" but not encryption.

AFF — compressed open source format—Can store image as a single file (>2GB) or as multiple files (.afd format)—Supports encryption and digital signatures.—Extensible

5

Disk image files: many issues.

On the digitalcorpora.org website, we distribute in multiple formats.

Index of /corp/drives/nps/nps-2009-domexusers Name Last modified Size Description

Parent Directory - narrative.txt 11-Aug-2009 13:38 366 nps-2009-domexusers.E01 02-May-2010 19:57 4.0G nps-2009-domexusers.E02 02-May-2010 20:04 72M nps-2009-domexusers.aff 20-Jan-2009 13:16 3.9G nps-2009-domexusers.raw 02-May-2010 18:23 40G nps-2009-domexusers.redacted.E01 02-May-2010 17:52 2.0G nps-2009-domexusers.redacted.E02 02-May-2010 18:07 2.0G nps-2009-domexusers.redacted.E03 02-May-2010 18:08 2.6M nps-2009-domexusers.redacted.aff 11-Aug-2009 11:39 3.9G nps-2009-domexusers.redacted.raw 11-Aug-2009 15:49 40G nps-2009-domexusers.xml 11-May-2009 12:21 36M realistic.aff 20-Jan-2009 13:16 3.9G

6

Forensics Software: Commercial vs. Open Source?

Commercial (e.g. EnCase, FTK, etc.) Widely used in government & industry (+) Educational pricing usually available (+) Must be licensed for classroom (-) Complex user interface can detract from instruction (-) Typically runs just on Windows (-) Requires hardware license management (dongle) (-)

Open Source (e.g. The SleuthKit — TSK) Runs on Windows, Mac and Linux (+) No dongle (+) Good platform for further research (+) Less functionality than proprietary programs (-) Poor user interfaces (-)

7

libewf: a tool for bridging commercial and open source

Libewf is an open source C library that decodes .E01 files.

libewf must be compiled and installed before building SleuthKit.

libewf also includes command line tools: ewfacquire

—simple disk imaging tool —Will also convert RAW to E01

ewfinfo—Prints information about E01 files

ewfverify—Verifies the CRCs and MD5 of an E01 file

8

AFFLIB v1-3

9

AFF was designed for large-scale drive imaging and archivingIn 1998 I started the "Drives Project." Looking for data on used computer equipment.

Between 1998-2005 I purchased 250 drives: Serial number info captured with atacontrol Drives imaged with dd Images stored in raw format, eventually compressed with gzip Good enough for my 2005 PhD Thesis.

In 2005 I started "Phase 2" of the project. Goal: Increase corpora size to 2500 drives. Development of new forensic techniques for LE & IC

Question: How to store the disk images?

10

There were not many choices in 2005 for disk images.

EnCase Format Proprietary; no open source implementation. (libewf released in 2006) 2GB size limit created a management nightmare. (FILE.E01, FILE.E02, FILE.E03…) No provision for encryption or digital signatures.

—Encryption — needed for privacy, security, & IRB approval—Digital Signatures — to enable capture by "trusted hardware."

Other proprietary formats: IXimager and ILook Investigator ProDiscover Image File Format SafeBack Vogon International's SDi32

PyFlag "Seekable gzip" Open source, but not implemented anywhere except PyFlag. No obvious way to store metadata

11

We decided to create AFF — the Advanced Forensic FormatFormat Goals (AFF) Open Format — All bits clearly defined and documented. One image file per physical disk Extensible — Store all of the metadata, imaging conditions, chain-of-custody, etc. Encryption

—Password-based private key—Certificate-based public key

Implementation Goals (AFFLIB) Multi-platform: Windows, MacOS, Linux, FreeBSD, etc. Open Implementation — No licensing fees. Easy to instrument — enable research in computer forensics

12

AFFLIB v1 has three distinct layers.

13

Bit-level layer — dictates how data is stored

Schema Layer — structure of stored data

API Layer — interface to analysis programs.

API Layer: designed for easy integration into existing programs

Simple interface:AFFILE *af = af_open()

af_seek(af,pos,SEEK_SET);

af_read(af,buf,sizeof(buf));

af_close(af)

14

API Layer — interface to analysis programs.

AFF stores all data as name/value pairs

NAME — A human-readable name for what's in the segment. 63 UTF-8 characters Some names are specified, but you can use whatever you wish

VALUE — A machine-interpreted value 32-bit unsigned long 0-232 8-bit values.

15

Schema Layer — structure of stored data

AFF stores all data as name/value pairs

The "schema" is standardized names for forensic data.

—sectorsize — Number of bytes per sector — 0x00000200 (512)—imagesize — Number of bytes in the logical image — 0x1000000000 (64GiB)—device_sn — Serial number of the device — "WCAM9J939319"—device_firmware — Drive capabilities

Forensic data is stored in "pages" Page size is determined when image is created Default page: 16MiB Pages can be encrypted with: NULL, RAW, ZLIB, LZMA, etc. Each page has a name: "page0", "page1", "page2" …

16

Schema Layer — structure of stored data

The bit-level layer dictates how data is stored.

AFFLIB can store name/value pairs in different ways. AFF file

—Series of named segments, each with a HEAD; LENGTH; DATA; FOOT—Easy to recover in the event of corruption, off-track writes, etc.

AFD file—Multiple AFF files in a single directory

AFF XML Amazon S3 VMDK (via QEMU disk layer)

AFFLIB also supports "legacy" formats: RAW, SPLIT RAW, EnCase E01 (libewf)

17

Bit-level layer — dictates how data is stored

page0 page1 page2 page3 SN

A simple example: creating a 64K blank disk image

#include <afflib/afflib.h>#include <string.h>#include <fcntl.h>#include <stdlib.h>

int main(int argc,char **argv){ u_char buf[65536]; memset(buf,0,sizeof(buf));

AFFILE *af = af_open("file.aff",O_RDWR|O_CREAT,0777); af_write(af,buf,sizeof(buf)); af_close(af); return(0);}g++ -odemo -I/usr/local/include demo.cpp -lafflib

Creates:$ ./demo$ ls -l file.aff-rwxr-xr-x 1 simsong staff 820 May 31 08:20 file.aff*$

18

The "afinfo" command shows the segments.$ afinfo -a file.afffile.aff is a AFF file data Segment arg length data======= ========= ======== ====badflag 0 512 BAD SECTOR.......U.8....}...Wjbadsectors 2 8 = 0 (64-bit value)afflib_version 0 7 "3.5.8"creator 0 5 a.outaff_file_type 0 3 AFFpagesize 16777216 0 page0 51 4 ....imagesize 2 8 = 65536 (64-bit value)

Total segments: 8 (8 real) Page segments: 1 Hash segments: 0 Signature segments: 0 Null segments: 0 Empty segments: 0

Total data bytes in segments: 547Total space in file dedicated to segment names: 73Total overhead for 8 segments: 192 bytes (8*(16+8))Overhead for AFF file header: 8 bytes$

19

AFFLIB v3 added encryption & digital signatures

Encryption: each segment can be encrypted with a 256-bit AES key. AFFLIB automatically encrypts & decrypts each segment on read if possible.

Key can be specified as: passphrase that decrypts an afkey_aes256 segment. X.509 certificate that decrypts a afkey_evp0 segment.

Passphrase can be specified two ways:export AFFLIB_PASSPHRASE='mypassphrase'afinfo file://:mypassphrase@/filename.aff

Calling code is unchanged! AFFILE *af = af_open("file.aff",O_RDWR|O_CREAT,0777); af_write(af,buf,sizeof(buf)); af_close(af);

20

AFFLIB encryption example.$ export AFFLIB_PASSPHRASE='password'$ ./demo$ afinfo file.afffile.aff is a AFF filefile.aff: has encrypted segments

file.aff data Segment arg length data======= ========= ======== ====badflag 0 512 BAD SECTOR..2w..a.....A. ;...badsectors 2 8 = 0 (64-bit value)afflib_version 0 7 "3.5.8"creator 0 5 a.outaff_file_type 0 3 AFFpagesize 16777216 0 page0 51 4 ....imagesize 2 8 = 65536 (64-bit value)Bold indicates segments that were decrypted.

Total segments: 9 (9 real) Page segments: 1 Hash segments: 0 Signature segments: 0 Null segments: 0

21

Without the passphrase, decryption is not possible.$ unset AFFLIB_PASSPHRASE$ afinfo -a file.afffile.aff is a AFF filefile.aff: has encrypted segments

Segment arg length data======= ========= ======== ====badflag 0 512 BAD SECTOR..2w..a.....A. ;...+badsectors 2 8 = 0 (64-bit value)afflib_version 0 7 "3.5.8"creator 0 5 a.outaff_file_type 0 3 AFFaffkey_aes256 0 52 ........_.....4>.Nf..q..N..d.pagesize/aes256 16777216 0 page0/aes256 51 20 ....dswS.K...NL+....imagesize/aes256 2 24 +Y6..3f.......n.........

Total segments: 9 (9 real) Encrypted segments: 3 Page segments: 0 Hash segments: 0 Signature segments: 0 Null segments: 0 Empty segments: 0

Total data bytes in segments: 631Total space in file dedicated to segment names: 107Total overhead for 9 segments: 216 bytes (9*(16+8))

22

AFFLIBv3 also adds digital signatures and parity pages.

Signatures are as signed SHA256 values. Each segment's SHA256 is calculated. SHA256 values are signed using OpenSSL's EVP_Sign functions.

Signatures can be stored: In individual signature segments. In a new Bill Of Materials (BOM) segment.

Multiple signatures can provide for chain-of-custody. afsign can also create a "parity page" for RAID-like reconstruction.

23

page0page1page2page3SN

afsign

page0page1page2page3SNbom1

afsign

page0page1page2page3SNbom1bom2

AFFLIBv3 status

AFFLIBv3 is in use today for research and education. Integrated with SleuthKit.

AFFLIB tools - A set of utilities for manipulating disk images. afcat — outputs an AFF file to stdout as a raw file afcopy & afconvert — segment-by-segment copying and verification (optional encryption) afinfo — prints details about the segments afrecover & affix — recovery of data within a corrupted AFF file afsign — signature tool afverify — verifies signatures afcompare — compares two disk images afcrypto — encrypt or decrypt a disk image in place afdiskprint — generates an XML-based "diskprint" for fast image comparison. affuse — allows AFF images to be "mounted" as raw files on Linux. afsegemnt — view or modify an individual segment

24

AFFLIBv3: strengths and weaknesses

Strengths: Single archive for storing all of the data and metadata. Strong data integrity Compact archiving format (16MB segment size, optional LZMA)

Weaknesses: Performance.

—16MB page size is problematic for some disk images due to MFT fragmentation.—Caching is only solution at the present:export AFFLIB_CACHE_PAGES=24 # Dedicates 24*16=384MB to cacheexport AFFLIB_CACHE_PAGES=64 # Dedicates 64*16=1GB to cache

Only one disk image per file—Problem for lots of small devices

No way to package extracted files as a "logical" evidence file.—e.g. FILE.L01

Limited commercial support.

25

AFF4

26

AFF4 is a collaborative effort between: Michael Cohen (Australian Federal Police; PyFlag) Simson Garfinkel (NPS; AFF) Bradly Schatz (Director of Schatz Forensic)

Why AFF4? Overcome AFF3 performance limitations. Need to store more kinds of structured information inside the evidence file. Unified data model and naming scheme.

Changes from AFF3: AFF container is now a ZIP64 file. 16MB pages are replaced with two-level Chunk/Bevy model libaff4 library in C; most tools written in Python.

27

AFF4 is designed to overcome AFF3's limitations

AFF4 concepts

Information model Abstract metadata – exists independent of the file's physical representation

Data model Concrete - How the information is represented on disk.

28

Information Model is based on RDF

Information is represented as statements about subjects. Statements have a subject, predicate and value:

aff4://1234 is_a “hard disk”aff4://1234 aff4:size 1E7

Values can be encoded using specialized “data_types." Meanings are precise. (They are not just a freely interpreted string.)

aff4://1234 aff4:acquired "2010-02-11T13:00:25+00:00"^^xsd:dateTime

A group of statements is called a Graph

29

The Data Model is the physical manifestation of the abstract information model.Graphs are serialized using RDF serializations (e.g. Turtle, XMLRDF etc).

Basic types of AFF4 objects: Volumes – store segments within them. Segments are atomic (indivisible) blobs of data. Streams – Data objects which can be opened for reading or writing (e.g. segments,

images, maps) Graphs – Collections of RDF statements – can be written to volumes.

All AFF4 objects are universally referenced to through a unique URL.

Like AFF3, AFF4 objects can be stored in multiple containers. AFF4 calls these "Volumes." A Volume can be a ZIP64 file, a database, or a collection of files in a directory.

30

AFF4 ZIP Volumes are AFF4's default volume format.

Uses ZIP64 standard for large file support Can be opened by any tool that supports ZIP files… … but data segments require special interpretation.

ZIP format is robust. There is a growing number of tools to recover corrupt ZIP files. Clear distinction between data content and data integrity.

ZIP format is malleable Can join / split volumes at any time Archive members have a universally unique name – it does not matter where they are

stored.

We do not use ZIP64 encryption and signing. We implemented our own.

31

AFF4 Image Stream is used for storing seekable, contiguous, compressed data.Data model is similar to EnCase E01: Data is split into chunks (32KiB by default) Chunks are compressed and written into bevies back to back

—2048 chunks per bevy by default

Information model Bevy indexes are stored in the aff4:index predicate Size is stored in aff4:size predicate Typically the information model will be stored in a graph within the volume.

32

chunk chunk chunk chunk

2048 chunks make a "bevy"(64MiB user data)

32KiB

Map streams are a collection of linear byte ranges from other streams.Every byte in the map stream is taken from an offset of some other stream.Conceptually maps are an array of points: Map offset, Target offset, Target name Offsets not in the array are interpolated

Maps are stored in the aff4:map predicate Can be encoded using a number of encoders for efficiency (e.g. inline, binary, text)

Map streams can be used for: Re-assembling RAID and LVM devices. Identifying files within a disk image — useful for zero-copy carving. Hash-based imaging — don't place objects in archive that are already in the corpus. TCP/IP stream reassembly — Create a map stream from TCP payloads.

33

libaff4 — our implementation of the AFF4 format.

Designed to test ideas and evolve the format by using it. Flexible – can combine all types of AFF4 objects together Python bindings automatically generated from C source code.

—Easy to keep in sync with C library—C library is very fast; Python bindings make development easy.

Multithreaded Easy to use

Status: API still in flux Information on the ForensicsWiki at:

—http://www.forensicswiki.org/wiki/AFF4—http://www.forensicswiki.org/wiki/LibAFF4

Download LibAFF4 from:—hg clone https://aff4.googlecode.com/hg/ aff4

34

File Recovery withThe Sleuth Kit

The Sleuth Kit (TSK) is a tool for working with disk images.

Command-line tools for working with disk images.Open source computer forensics toolkitOriginally “The Coroner's Toolkit,” developed by Dan Farmer & Wietse VenemaRewritten and maintained by Brian Carrier: Carrier created a modular internal design. Added image layer, disk tools, FAT recover, 64-bit

support, live analysis, UFS2 & EXT3 Journal support. Coordinating community development

http://www.sleuthkit.org/

http://www.sleuthkit.org/

36

TSK is the open source forensic standard.

Shortcomings: No support for encrypted file systems. Poor support for compressed files.

Image Formats raw, split-raw, AFF, EWF, etc.

Partitioning SchemesDOS MBR, GPT, Apple, BSD, Solaris

File SystemsFAT 12/16/32; NTFS; ext2/3; UFS 1/2; ISO9660

PlatformsLinux, OSX, Windows, *BSD, Cygwin, Solaris

37

Sleuthkit works directly with disk images.

Common uses: View files & directories in a forensically sound manner View deleted files Document location of information.

Without forensic tools, viewing data can change it! "last viewed" and "last modified" times can be changed. Entries can be put into the registry. Temp files can be created.

38

SleuthKit works with both data and metadata.

Data is the content of files.

Metadata tells how to work with the disk and the data. Partition table List of available sectors Directory information

Note: "Metadata" like EXIF and Word "properties" are considered data here.

Root Dir

Data

readme.txt

config.sys

motd

home

File Allocation Table

master boot record

cylinder 0, head 0, sector 1

4 16-byte entries

readme.txt

config.sys

motdHome Dir

FAT

FAT

FAT

FAT

file1.txt

file2.txt

...

file1.txt

file2.txt

FAT

FAT

39

DiskImage

TSKLibrary

CLITools

Autopsy

TSK is a modular system

Most TSK commands are run from the command line.

You can also write your own programs that call the library directly.

The Autopsy Forensic Explorer runs the commands and shows you the results in a web browser.

40

TSK's "f" programs work with file systems.

fls File List

fsstat File System Status

ffind File Find

41

TSK tools handle many disk image formats:

List the file systems with "-f list":$ fls -i listSupported image format types:! raw (Single raw file (dd))! aff (Advanced Forensic Format)! afd (AFF Multiple File)! afm (AFF with external metadata)! ewf (Expert Witness format (encase))! split (Split raw files)$

To have support for AFF & EWF, you need to separately install them first!

42

TSK routines handle many file systems:

List the file systems with "-f list":$ fls -f listSupported file system types:! ntfs (NTFS)! fat (FAT (Auto Detection))! ext (ExtX (Auto Detection))! iso9660 (ISO9660 CD)! hfs (HFS+)! ufs (UFS (Auto Detection))! raw (Raw Data)! swap (Swap Space)! fat12 (FAT12)! fat16 (FAT16)! fat32 (FAT32)! ext2 (Ext2)! ext3 (Ext3)! ufs1 (UFS1)! ufs2 (UFS2)$

43

spice1 is an image from 32 MB SD card.

You have three files:$ ls -l spice1*-rw-r--r-- 1 simsong staff 263759 Nov 3 21:05 spice1.E01-rw-r--r-- 1 simsong staff 215837 Nov 3 22:11 spice1.aff-rw-r--r-- 1 simsong staff 32079872 Nov 3 20:54 spice1.raw$

Let's look at spice1.raw:

44

List files in the disk image withfls - File List

$ fls spice1.rawr/r 3:! SPICE (Volume Label Entry)d/d * 5:! New Folderd/d 6:! junkd/d * 8:! New Folderd/d 9:! gunsd/d * 11:! New Folderd/d * 12:! _rugsr/r * 13:! _ecret.gifr/r 14:! secret.gifr/r * 15:! _ecret2.gifr/r 16:! secret2.gifr/r * 17:! _giastw.jpgr/r 18:! ogiastw.jpgv/v 994691:!$MBRv/v 994692:!$FAT1v/v 994693:!$FAT2d/d 994694:!$OrphanFiles$

45

What do these numbers mean?

r/r 14: secret.gif

r/r Regular file

14metadata block #

("inode")

secret.gif file name

46

Use icat and the inode # to get the file contents.

r/r 14: secret.gif

$ icat spice1.raw 14 > 14.jpg

$ open 14.jpg mac

$ gnome-open 14.jpg gnome

> start 14.jpg Windows

47

Use icat and the inode # to get the file contents.

r/r 14: secret.gif

$ icat spice1.raw 14 > 14.jpg

$ open 14.jpg mac

$ gnome-open 14.jpg gnome

> start 14.jpg Windows

47

fsstat shows technical details about the file system.

$ fsstat spice1.rawFILE SYSTEM INFORMATION----------------------------------------File System Type: FAT16

OEM Name: MSDOS5.0Volume ID: 0x64fb06c6Volume Label (Boot Sector): NO NAME Volume Label (Root Directory): SPICE File System Type Label: FAT16

Sectors before file system: 64

File System Layout (in sectors)Total Range: 0 - 62655* Reserved: 0 - 1** Boot Sector: 0* FAT 0: 2 - 244* FAT 1: 245 - 487* Data Area: 488 - 62655** Root Directory: 488 - 519** Cluster Area: 520 - 62655

METADATA INFORMATION----------------------------------------Range: 2 - 994694Root Directory: 2

CONTENT INFORMATION----------------------------------------Sector Size: 512Cluster Size: 512Total Cluster Range: 2 - 62137

FAT CONTENTS (in sectors)----------------------------------------520-520 (1) -> EOF521-521 (1) -> EOF523-526 (4) -> EOF527-539 (13) -> EOF540-610 (71) -> EOF611-687 (77) -> EOF688-795 (108) -> EOF796-864 (69) -> EOF$

48

img_stat shows information about the disk image.

$ img_stat spice1.rawIMAGE FILE INFORMATION--------------------------------------------Image Type: raw

Size in bytes: 32079872$

$ img_stat spice1.E01 IMAGE FILE INFORMATION--------------------------------------------Image Type:! ! ewf

Size of data in bytes:! 32079872MD5 hash of data:! aebfd76cdd9b3eb0f6c1658efc226886$

49

TSK command-line programs divided up by layer.

j- journal layer

f- file name layer

i- metadata (inode) layer

blk- content (data) layer

mm- volumes/partitions

img_- Disk images

50

TSK command-line programs divided by function.

-stat print status

-ls list something

-find find something

-cat output contents

-calc compute something

51

Here are the commands we've used so far.

"f" tools work with file systems: fsstat — File system stat fls — list files and their inodes ffind — translates an inode number back to a file.

"i" tools work with file system metadata (inodes & MFT) ifind — Finds the metadata given a data unit (-d), a file name (-n),

or the parent's metadata address (-p) icat — Outputs the contents of an inode.

"img" tools work with disk images: img_stat — Prints statistics about the image img_cat — Copies the raw sectors to stdout.

52

Digital Forensics XML

Today most forensic tools report metadata in human-readable form. Location of partitions. Location of a file. File owner, MAC times, etc. Microsoft Office permissions.

This leads to problems: Each tool processing a disk image must re-interpret the file system. One tool cannot be easily validated against another.

DFXML allows tools to interoperate.

54

Digital Forensics XML (DFXML) is a tool for describing file systems and file metadata.

XML allows us to separate file extraction from forensic analysis.

You can start using this framework today.You can easily expand it.

The basic idea: use XML as an intermediate format.

55

<XML> Output

1 32

Currently DFXML has four kinds of XML tags.

Per-Image tags <fiwalk> — outer tag<fiwalk_version>0.4</fiwalk_version><Start_time>Mon Oct 13 19:12:09 2008</Start_time><Imagefile>dosfs.dmg</Imagefile><volume offset=”26112”>

Per <volume> tags:<volume offset=”26112”> <Partition_Offset>26112</Partition_Offset> <block_size>512</block_size> <ftype>4</ftype> <ftype_str>fat16</ftype_str> <block_count>60749</block_count>

Per <fileobject> tags:<fileobject> <filename>DCIM/100CANON/IMG_0001.JPG</filename> <filesize>855935</filesize> <byte_runs> <run file_offset='0' fs_offset='55808' img_offset='81920' len='855935'/> </byte_runs></fileobject>

Plug-in created tags….

56

fiwalk is a tool that produces DFXML files.

fiwalk is a C++ program built on top of SleuthKit

$ fiwalk [options] -X file.xml imagefile

Features: Finds all partitions & automatically processes each. Handles file systems on raw device (partition-less). Creates a single output file with forensic data from all plug-ins.

Single program has multiple output formats: XML (for automated processing) ARFF (for data mining with Weka) "walk" format (easy debugging) SleuthKit Body File (for legacy timeline tools) [CSV (for spreadsheets) ? ]

57

XML ARFF Body

<XML> Output

1 32

fiwalk has a plugable metadata extraction system.

Configuration file specifies Metadata extractors: Currently the extractor is chosen by the file extension.

*.jpg dgi ../plugins/jpeg_extract*.pdf dgi java -classpath plugins.jar Libextract_plugin*.doc dgi java -classpath ../plugins/plugins.jar word_extract

Plugins are run in a different process for safety. We have designed a native JVM interface which uses IPC and 1 process.

Metadata extractors produce name:value pairs on STDOUTManufacturer: SONYModel: CYBERSHOTOrientation: top - left

Extracted metadata is automatically incorporated into output.<Manufacturer>SONY</Manufacturer><Model>CYBERSHOT</Model>

58

<XML> Output

1 32

fiwalk's biggest challenge: UTF-8 filenames

Many filesystems allow invalid XML characters in filenames. Control Characters Invalid Unicode characters (FF) and sequences (EF 32) "<" and ">"

SleuthKit returns UTF-8 NTFS and HFS require valid Unicode in filenames Corrupted disks might not have valid Unicode.

Solution: Escaping for both XML and Unicode XML escaped — &lt; &gt; etc. Control characters are currently turned into "^" by Sleuthkit. DEL characters are quoted to \xFF Each character is tested for UTF-8; invalid characters escaped (e.g. \xEF \x32) "\" is escaped to \x5C

59

fiwalk.py: a Python module for automated forensics.

Key Features: Automatically runs fiwalk with correct options if given a disk image Reads XML file if present (faster than regenerating) Creates fileobject objects.

Multiple interfaces: SAX callback interface

fiwalk_using_sax(imagefile, xmlfile, flags, callback)

—Very fast and minimal memory footprint

SAX procedural interfaceobjs = fileobjects_using_sax(imagefile, xmlfile, flags)

—Reasonably fast; returns a list of all file objects with XML in dictionary

DOM procedural interface(doc,objs) = fileobjects_using_dom(imagefile, xmlfile, flags)

—Allows modification of XML thatʼs returned.

60

<XML> Output

1 32

The SAX and DOM interfaces both return fileobjects!

The Python fileobject class is an easy-to-use abstract class for working with file system data.

Objects belong to one of two subclasses:fileobject_sax(fileobject)! — for the SAX interfacefileobject_dom(fileobject)! – for the DOM interface

Both classes support the same interface:—fi.partition()—fi.filename(), fi.ext()—fi.filesize()—fi.ctime(), fi.atime(), fi.crtime(), fi.mtime()—fi.sha1(), fi.md5()—fi.byteruns(), fi.fragments()—fi.content()

61

<XML> Output

1 32

Example: calculate average file size on a disk

Using SAX array interface:import fiwalk

objs = fileobjects_using_sax(imagefile, xmlfile, flags)print "average file size: ",sum([fi.filesize() for fi in objs]) / len(objs)

For the Python-impaired:

import fiwalk

objs = fileobjects_using_sax(imagefile, xmlfile, flags)sum_of_sizes = 0for fi in objs: sum_of_sizes += fi.filesize()print "average file size: ",sum_of_sizes / len(objs)

62

<XML> Output

1 32

Example: Find and print all the files 15 bytes in length.

Using SAX callback interface:import fiwalk

objs = fileobjects_using_sax(imagefile, xmlfile, flags)for fi in filter(lambda x:x.filesize()==15, objs): print fi

For the Python-impaired:

import fiwalk

objs = fileobjects_using_sax(imagefile, xmlfile, flags)for fi in objs: if fi.filesize()==15: print fi

63

<XML> Output

1 32

The fileobject class allows direct access to file data.

byteruns() is an array of “runs.”<byte_runs type=’resident’>

<run file_offset='0' len='65536' fs_offset='871588864' img_offset='871621120'/>

<run file_offset='65536' len='25920' fs_offset='871748608' img_offset='871780864'/>

</byte_runs>

Becomes:[byterun[offset=0; bytes=65536], byterun[offset=65536; bytes=25920]]

Each byterun object has:run.img_offset! - Disk Image offsetrun.fs_offset! - File system offsetrun.bytes! ! - number of bytes

run.start_sector() ! — Starting Sector #run.sector_count()! — # of sectorsrun.content()! - content of file

64

<XML> Output

1 32

The fileobject class allows direct access to file data.

byteruns() returns that array of “runs” for both the DOM and SAX-based file objects.

>>> print fi.byteruns()[byterun[offset=0; bytes=65536], byterun[offset=65536; bytes=25920]]

Accessor Methods: fi.contents_for_run(run) " — Returns the bytes from the linked disk image fi.contents() " " — Returns all of the contents fi.file_present(imagefile=None) " — Validates MD5/SHA1 to see if image has file fi.tempfile(calMD5,calcSHA1)" — Creates a tempfile, optionally calculating hash

65

<XML> Output

1 32

We have several small applications with this framework.

iblkfind.py given a disk block in an image, say which files map there.

icarvingtruth.py Reports location of carvable files given an earlier XML "map" of the disk image.

idifference.py Forensic Disk Differencing

iverify.py Reads an image file and XML file; reports which files are actually resident.

imicrosoft_redact.py "breaks" a Windows boot disk so that it can be distributed.

66

iblkfind.py shows how simple it is to build an application.#!/usr/bin/python

import sys,fiwalk

if __name__=="__main__":

from optparse import OptionParser parser = OptionParser() parser.usage = '%prog [options] image.iso s1 [s2 s3 s3 ...]' (options,args) = parser.parse_args()

if len(args)<1: parser.print_help() sys.exit(1)

sectors = set([int(n) for n in args[1:]])

def process(fi): for s in sectors: if fi.has_sector(s): print "%d\t%s" % (s,fi.filename()) fiwalk.fiwalk_using_sax(imagefile=open(args[0]),callback=process)

67

frag_find performs hash-based file carving

Input: 1 or more Master Files

A disk image

Output: Digital Forensics XML of where the files are.

<fileobject> <byte_run> … </byte_run></fileobject>

<fileobject> <byte_run> … </byte_run></fileobject>

Uses: Exfiltration of sensitive documents; Digital Loss Detection; etc.

68

1 2 3 4 5 6

B1 B2 B3

1 2 3 4 5 6

B1

B2

B3

bulk_extractor

/Users/simsong/domex/src/bulk_extractor/src/r0: total used in directory 140432 available 37283372 drwxr-xr-x 12 simsong simsong 408 May 11 22:19 . drwxr-xr-x 67 simsong simsong 2278 May 13 21:19 .. -rw-r--r-- 1 simsong simsong 48 May 11 22:22 _thread0.stat -rw-r--r-- 1 simsong simsong 48 May 11 22:22 ccn.txt -rw-r--r-- 1 simsong simsong 102 May 11 22:18 config.cfg -rw-r--r-- 1 simsong simsong 3184451 May 11 22:22 domain.txt -rw-r--r-- 1 simsong simsong 4197524 May 11 22:22 email.txt -rw-r--r-- 1 simsong simsong 63 May 11 22:22 report.txt -rw-r--r-- 1 simsong simsong 839 May 11 22:22 rfc822.txt -rw-r--r-- 1 simsong simsong 0 May 11 22:18 tcp.txt -rw-r--r-- 1 simsong simsong 1086578 May 11 22:22 url.txt -rw-r--r-- 1 simsong simsong 135305057 May 11 22:22 wordlist.txt

bulk_extractor finds and reports interesting strings

Currently bulk_extractor finds: email addresses, email Subject: lines, dates URLs Credit Card Numbers Other interesting information

bulk_extractor: Ignores file systems & file formats Is multi_threaded; can process bulk data very fast summarizes what it finds in easy to read reports.

Input formats: raw (dd) disk images Database files EnCase E01 files AFF files

70

sample output on nps-2009-ubnist1/ubnist1.gen3.raw

email.txt:68728516 sy@f.Lr70878552 cristy@mystic.es.dupont.com262257627 PH@Q.Jp317697633 micke@imendio.com317697671 alexl@redhat.com317697704 shaunm@gnome.org324391670 ari@debian.org324391814 tugrul@galatali.com328471989 hp@redhat.com335671479 mike@markley.org625382493 QU@X.tn738975536 astarikovskiy@suse.de739006965 tiwai@suse.de739006998 perex@perex.cz739052440 perex@perex.cz

email_histogram.txt:n=27640 ubuntu-users@lists.ubuntu.comn=17133 ubuntu-motu@lists.ubuntu.comn=12883 ubuntu-devel-discuss@lists.ubuntu.comn=4032 language-packs@ubuntu.comn=1966 ubuntu-desktop@lists.ubuntu.comn=1484 debian-x@lists.debian.orgn=1006 pkg-perl-maintainers@lists.alioth.debian.orgn=878 debian-qt-kde@lists.debian.org

71

bulk_extractor is a flexible tool

Possible uses: Rapid characterization of newly ingested media Identification of PII Determine who used a computer Social network analysis

Availability: Source code compiles on MacOS, Linux & Windows Pre-compiled Windows application available.

72

AFF history and Roadmap AFFLIB AFF4

Digital Forensics XML fiwalk fiwalk.py

Promote Tools that are available to download NOW! frag_find bulk_extractor

In summary:This talk presented open source tools that you can use.

73

<fileobject>

Bit-level layer — dictates how data is stored

Schema Layer — structure of stored data

API Layer — interface to analysis programs.

Questions?

Recommended