Presented at SecTor · Hadoop architecture 8 Name Node Daemons Web Server Name Node logs File...

October, 2012

Hadoop Forensics

Kevvie Fowler, GCFA Gold, CISSP, MCTS, MCDBA, MCSD, MCSE

Presented at SecTor

About me

Kevvie Fowler

Day job: Lead the TELUS Intelligent Analysis practice

Night job: Founder and principal consultant, Ringzero Frequent speaker at major security conferences (Black Hat USA, Sector,

Appsec Asia, Microsoft, etc.)

Industry contributions:

Data, data and more data

The world uses and stores a lot of data Increased regulatory requirements

Modernizing of industries

Disk space is cheap and people “store and forget”

Industry has experienced 40 to 50% growth in storage over the last 3 years

Several new data management technologies have been released over the past few years to help us manage the problem

Technology to the rescue

Data management landscape (SQL, NoSQL, NewSQL, etc.)

Source: blogs.the451group.com

The crime scene of today

The digital crime scene has increased

What we historically examined over a year will be dwarfed in a single big data investigation

Hadoop the popular big data solution

Business is looking at Hadoop to solve the big data problem

But wait aren’t there security issues with Hadoop?

Secure or not investigations will lead there and investigations will need to occur

Hadoop overview

What is this Hadoop?

An opensource framework for large-scale data processing Moves processing to data

Leverages MapReduce to take a query, divide it and run in parallel over multiple commodity nodes

Hadoop is basically an open source MapReduce implementation

Supports structured and unstructured data

Hadoop architecture

Name Node

Daemons

Web Server

Name Node logs

File System Image

FS Image files

Edit logs

Meta Data Data

Node 1

Web Server

File 1 Block 1

File 2 Block 1

File 2 Block 2

Storage

Task Tracker

Logging

Data Node 3

Web Server

File 1 Block 1

File 2 Block 1

File 2 Block 2

Storage

Task Tracker

Logging

Data Node 2

Web Server

File 1 Block 1

File 2 Block 2

File 2 Block 1

Storage

Task Tracker

Logging

Interfaces

Job- Tracker

Forensics 101

The standard forensic approach | Acquire it all, ask questions later Future science can be applied against it

Repeatable analysis

Many Hadoop clusters consist of huge data sets that can’t practically be acquired in their entirety

This session will cover Hadoop – specific data reduction techniques

Cloudera Hadoop

Hadoop version 2.0

Hadoop HDFS

Hadoop artifacts

Standard OS fare

Connections, Users, Logs (SSH, etc.)

Cluster properties

Configuration, size, servers, key settings

Recent activity

Jobs (active, historical)

Current state

Metadata

– FSImage

– Edit logs

– Only the necessary ones!

Tools Native Hadoop tools (Command line, WebUI)

Hadoop artifacts | cluster properties

Cluster properties | via command line

Results Size of Hadoop cluster

Number/addresses of data nodes

hdfs dfsadmin -report

Cluster properties | via WebUI <address>:50070/dfshealth.jsp

Cluster properties | via WebUI <address>:50070/dfshealth.jsp (continued)

Cluster properties | Modes

Modes Standalone | single node & everything in one java process

Pseudo-Distributed | single node & daemons in separate java processes

Fully-Distributed | daemons run on a cluster of machines

Managed via two types of configuration files Default (*-default.xml)

Site-specific (*-site.xml)

10/9/2012 15

File (/hadoop/conf) Description Key properties

core-site.xml General Hadoop properties

fs.trash.interval, fs.trash.checkpoint.interval

hdfs-site.xml HDFS properties dfs.replication, dfs.blocksize dfs.permissions.enabled, hadoop.tmp.dir, dfs.namenode.name.dir, dfs.namenode.checkpoint.dir, dfs.datanode.data.dir

mapred-site.xml MapReduce properties mapreduce.task.tmp.dir mapreduce.jobhistory.done-dir

log4j.properties Logging properties hadoop.mapred.JobTracker hadoop.mapred.TaskTracker namenode.FSNamesystem.audit

Hadoop artifacts | recent activity

Recent activity | Jobs

Jobs are executed from a client JAR file

Each job will have one or more sub tasks managed by the task trackers

Two files are created in conjunction with each executed job

Job Configuration XML file | contain the job configuration as specified when the job is launched

Job Status File | the counters, status, start/stop time, task attempt details etc.

Recent activity | Job history

Lists all jobs active (yet to complete) JobID, StartTime, Executing user

JobID can be used to view the job status file which contains job execution details

mapred job –list all

Recent activity | Job JAR file

Jar file

Job configuration file (/history)

Job status file (/history)

Artifact preservation | metadata

Metadata| FSImage file Contains listing of all directory\files and associated metadata

– Name

– Replication level

– Modification time (Files & directories)

– Access time

– Block size

– Permissions (directories only)

– Etc.

2 saved FSImage files by default

Artifact preservation | metadata

Metadata| FSImage file

_nn The last transaction (and all prior) that modified the image

Current state of Hadoop | FSImage file

Recommended: Force a checkpoint prior to gathering FSImage to refresh the image file

Beware - Requires the cluster to be taken off-line!

Offline Image Viewer (OIV) can be used to gather the current FSImage file

hdfs secondarynamenode –checkpoint force

hdfs oiv -i " <file and path> " -o " <file and path> " -p Delimited -delimiter "|"

Current state of Hadoop | FSImage file

Results

Current state of Hadoop | Edit log

Metadata| Edit log

All writes to files on disk are first completed within the edit log

1M Edit operations are retained by default

▪ OP_INVALID

▪ OP_ADD

▪ OP_RENAME_OLD

▪ OP_DELETE

▪ OP_MKDIR

▪ OP_SET_REPLICATION

▪ OP_DATANODE_ADD

▪ OP_DATANODE_REMOVE

▪ OP_SET_PERMISSIONS

▪ OP_SET_OWNER

▪ OP_CLOSE

▪ OP_SET_GENSTAMP

▪ OP_SET_NS_QUOTA

▪ OP_CLEAR_NS_QUOTA

▪ OP_TIMES

▪ OP_SET_QUOTA

▪ OP_RENAME

▪ OP_CONCAT_DELETE

▪ OP_SYMLINK

▪ OP_GET_DELEGATION_TOKEN

▪ OP_RENEW_DELEGATION_TOKEN

▪ OP_CANCEL_DELEGATION_TOKEN

▪ OP_UPDATE_MASTER_KEY

▪ OP_REASSIGN_LEASE

▪ OP_END_LOG_SEGMENT

▪ OP_START_LOG_SEGMENT

▪ OP_UPDATE_BLOCKS

Metadata| Edit log (continued)

Operations within the Hadoop editlog

Metadata | Edit log (continued)

hdfs oev –i <file and path> -o <file and path> -p stats

Offline Edits Viewer (OEV) with the stats processor dumps a summary of operations within the edit log

OEV can be run online

Metadata| Edit log (continued)

Offline Edits Viewer (OEV) without the stats processor dumps operations performed after the last FSImage file

hdfs oev -i "<edit filepath>" –o "<out file>" -v

Edit log (continued)

Operation

Transaction ID

File created

Modification time Access time

Allocated block

File permissions

Example of file copied from local system to hadoop

Impersonation overview

Current state of Hadoop | File retrieval

hadoop distcp Uses Map/reduce to move large groups of files

– Parallel file transfer

-p[rbugp] Preserve r: replication number b: block size u: user g: group p: permission

pt: last modification and last access times – planned for a future release

Fast but not the best option Pollutes job and task history

Does not preserve timestamps

Hadoop overview | what is it

Fsck command can be used to list the blocks associated with a given file

hdfs fsck /<file> -files -blocks -racks

Current state of Hadoop | File retrieval (continued)

Navigate the local FS of the specified data node to the block location

Blocks (files) can be imaged directly from the local FS

Hadoop Trash

Often enabled by organizations

When enabled – deleted files are not really deleted

Files are moved to Trash location (../user/<username>/.Trash) where they are scheduled for deletion at a later date\time

References

www.cloudera.com

hadoop.apache.org

Thank-you!

Additional information and questions:

kevvie.fowler@ringzero.ca | www.ringzero.ca

Presented at SecTor · Hadoop architecture 8 Name Node Daemons Web Server Name Node logs File...

Documents

Yosemite V3: Facebook Multi-Node Server Platform Design

Server and Clients Node Capture Attacks

Example Custom LCE Log Parsing Minecraft Server Logs · PDF fileExample Custom LCE Log Parsing – Minecraft Server Logs ... it would be tempting to name the type “minecraft” and

HP ProLiant SL4540 Gen8 Server Node Maintenance and ... · HP ProLiant SL4540 Gen8 Server Node Maintenance and Service Guide Abstract This guide describes identification and maintenance

ODCC Scorpio Multi-node Server for OPNFV

Windows Server 2003 Logs

Step by Step 4 Node Cluster with SQL 2016...Virtual Machines Disk & Shared Storage Layout Both SQL Server will use shared storage for Cluster, DB, Logs and Temps. Server Name C Drive

CYBER ATTACK SURVIVAL CHECKLIST€¦ · £ Consolidate and Monitor Internet Egress Points ... Active directory, server event Logs, Firewall Logs, ids, and Proxy Logs should all be

DMTF Indication Standard Indication Providers Managed Node Management Node CIM Client Subscriber CIM Server Indication Consumer Management Node CIM Listener Indication Receipt & Consumption

ABB-system 800xa 5.1 Server Node Virtualization

DATABASES FOR AN ENGAGED WORLD: REQUIREMENTS … SERVER CLUSTER. Node 1. Node 2. Node 3. Node 4. Node 5. Node 6. Node 7. Node 8. Data Service. Data Service. Data Service. Global. Index

speed scale sql · 2018-11-29 · Java .NET C++ PHP REST Memory-Centric Storage Server Node Server Node Server Node IN-MEMORY IN-MEMORY IN-MEMORY JDBC ODBC Distributed SQL Cross-platform

Dell Server Administrator Version 7.4 User s Guide€¦ · Logs Server Administrator displays logs of commands issued to or by the system, monitored hardware events, and system alerts

Discovering Web Server Logs Patterns Using Clustering and ... · Discovering Web Server Logs Patterns Using Clustering and Association Rules Mining 161 III. Working Scheme The web

Explore The Capabilities Of Lora - GPShome.ru...2016/09/02 · GlobalSat LoRaWAN Eco-system Tracker Node Gateway Network Server Tracking Application Sensor Node Application Server

Multi-Dimensional Analysis of ISA Server Logs

Hidden Gems in CF10 - CArehart.org › presentations › Hidden_Gems_in_CF10.pdf• New “access logs” enabled by default (in addition to your web server logs) • in [cf10]\cfusion\runtime\logs,

HUAWEI Tecal XH321 V2 Server Node White Paper V1 Cloud Congress Shan… · HUAWEI Tecal XH321 V2 Server Node White Paper V1.0 HUAWEI Tecal XH321 V2 Server Node White Paper V1.0 Issue

Could not connect to remote node database, primary server

Monitoring Microsoft SQL Server Audit Logs with EventTracker · PDF fileMonitoring Microsoft SQL Server Audit Logs with EventTracker The Importance of Consolidation, Correlation,