24
Hyperion :High Volume Stream Archival Divya Muthukumaran

Hyperion :High Volume Stream Archival Divya Muthukumaran

Embed Size (px)

DESCRIPTION

Live Monitoring Packets are examined in real time Compute and continually update traffic statistics Discard the captured packet headers once examined Why the need to store packet headers?

Citation preview

Page 1: Hyperion :High Volume Stream Archival Divya Muthukumaran

Hyperion :High Volume Stream Archival

Divya Muthukumaran

Page 2: Hyperion :High Volume Stream Archival Divya Muthukumaran

Area

Network Monitoring Identify problems due to overloaded and/or

crashed servers, network connections or other devices

Example: To determine the status of a webserver, monitoring software may periodically send an HTTP request to fetch a page

Page 3: Hyperion :High Volume Stream Archival Divya Muthukumaran

Live Monitoring

Packets are examined in real time Compute and continually update traffic

statistics Discard the captured packet headers once

examined Why the need to store packet headers?

Page 4: Hyperion :High Volume Stream Archival Divya Muthukumaran

Live Monitoring Packets are examined in real time

Compute and continually update traffic statistics Discard the captured packet headers once examined

Why the need to store packet headers? Example: Network forensics

To go back and examine the root cause of a problem Ex: See how an intruder gained entry, How a worm

infection happened

Page 5: Hyperion :High Volume Stream Archival Divya Muthukumaran

What is the need of such a system?

Querying and examining live data Data Archival

Capture the data at wire speeds, Index and store them

Efficiently support retrieval and processing of archived data

Specifically designed to handle needs of high volume stream archival

Page 6: Hyperion :High Volume Stream Archival Divya Muthukumaran

Why not traditional databases?

Some statistics A single GB link can generate over 100,000

packets and tens of MBs of archival data. A monitor may record from Multiple links.

Page 7: Hyperion :High Volume Stream Archival Divya Muthukumaran

Design Principles

Support Queries not reads Implies the need to maintain indexes

Writes Sequential and Immutable

Archive locally , summarize globally Scalability Vs Need to avoid flooding

Scalability: Favors local archiving and indexing to avoid network writes

Need to answer Distributed queries: favors sharing information across nodes

Page 8: Hyperion :High Volume Stream Archival Divya Muthukumaran

Hyperion

Three Key components Stream File System

High volume archiving and querying Multi-level index structure

High update rates + reasonable lookup performance

Distributed index layer Distributes a summary of local indices to

enable distributed querying

Page 9: Hyperion :High Volume Stream Archival Divya Muthukumaran

Design choices for the Hyperion Storage System Storage of multiple high-speed traffic streams

without loss Support for concurrent read activity without

loss of write performance Re-use of storage in a buffer-like fashion

Page 10: Hyperion :High Volume Stream Archival Divya Muthukumaran

Stream File System

Stores Streams as opposed to files Characteristics

Recycled : When storage is full new data replaces old data.

In a GP File system new data is lost old is retained

Immutable Record-oriented: data is written in fixed or

variable length records

Page 11: Hyperion :High Volume Stream Archival Divya Muthukumaran

Can we use a GP FS?

Need to map streams <=>files

Page 12: Hyperion :High Volume Stream Archival Divya Muthukumaran

LogFile Rotation

Page 13: Hyperion :High Volume Stream Archival Divya Muthukumaran

Stream FS

Page 14: Hyperion :High Volume Stream Archival Divya Muthukumaran

Stream FS Organization

Los-structured FS What problem?

Cleaning/Garbage collection StreamFS solves the cleaning problem

Guarantee : Storage guarantee for each stream

Small segment size Check if next segment is a surplus . If yes then

overwrite , otherwise skip.

Page 15: Hyperion :High Volume Stream Archival Divya Muthukumaran

Stream FS Organization Los-structured FS What problem? Cleaning/Garbage collection StreamFS solves the cleaning problem

Guarantee : Storage guarantee for each stream Small segment size (1 or ½ MB)

Check if next segment is a surplus . If yes then overwrite , otherwise skip.

Advantages? Storage Reservation Best effort use of remaining storage

Page 16: Hyperion :High Volume Stream Archival Divya Muthukumaran

Reads

First get index Use index to get data Persistent Handles

Returned from each write operation Passed to read op to retrieve data What does the handle contain?

Disk location , approximate length Allows data to be retrieved directly

Page 17: Hyperion :High Volume Stream Archival Divya Muthukumaran

Handle issues

Validate the handle. How? Self certifying record header

Id of the stream Permissions of the stream Record length Hash (used for validating the handle)

Page 18: Hyperion :High Volume Stream Archival Divya Muthukumaran

Stream FS Organization Record

Variable length On-disk record + header

Block Fixed length Multiple records of the same stream

Block Map Every nth block (stream ID + in-stream sequence number for each of

the preceding n-1 blocks) Used for easy write allocation

Page 19: Hyperion :High Volume Stream Archival Divya Muthukumaran

Stream FS Organization

Page 20: Hyperion :High Volume Stream Archival Divya Muthukumaran

Indexing

Uses signature based Indices Signature for each segment

Can check if a record with a key k is present in the segment or not

Does not tell you where the record is present in the segment

Page 21: Hyperion :High Volume Stream Archival Divya Muthukumaran

Multi-level Indices

Page 22: Hyperion :High Volume Stream Archival Divya Muthukumaran

Multi Level Indices

Uses a Bloom Filter Hash (key) -> b bits In b bits k bits are set to 1

H(key1)||H(key2)…||H(keyn) = Hs (Signature) How to check for presence of a record?

Compute hash of its key kr, H(kr) If a bit in H(kr) is set but not set in Hs then the

value is not present False positives

Page 23: Hyperion :High Volume Stream Archival Divya Muthukumaran

Distributed Index How to handle distributed queries without flooding?

Maintain distributed index Integrated view of all nodes Coarse-grain summary of data at each node is needed

Can use the top level index in the Hyperion One index node per time interval

All nodes send their top-level indices to this node Temporally–distributed index

Page 24: Hyperion :High Volume Stream Archival Divya Muthukumaran