Global Intrusion Detection Using Distribute Hash Table

Global Intrusion Detection Global Intrusion Detection Using Distribute Hash TableUsing Distribute Hash Table

Jason Skicewicz, Laurence Jason Skicewicz, Laurence Berland, Yan ChenBerland, Yan Chen

Northwestern University 6/2004Northwestern University 6/2004

Current ArchitectureCurrent Architecture

Intrusion Detection SystemsIntrusion Detection Systems• Vulnerable to attackVulnerable to attack• Many false responsesMany false responses• Limited network viewLimited network view• Varying degrees of intelligenceVarying degrees of intelligence

Centralized Data AggregationCentralized Data Aggregation• Generally done manuallyGenerally done manually• Post-mortem global viewPost-mortem global view• Not real time!Not real time!

Sensor Fusion CentersSensor Fusion Centers

Sensor fusion centers (SFC) aggregates Sensor fusion centers (SFC) aggregates information from sensors throughout the information from sensors throughout the networknetwork• More global viewMore global view• Larger information poolLarger information pool• Still vulnerable to attackStill vulnerable to attack• Overload potential if multiple simultaneous Overload potential if multiple simultaneous

attacksattacks Can’t we leverage all the participants?Can’t we leverage all the participants?

Distributed Fusion CentersDistributed Fusion Centers

Different fusion centers for different Different fusion centers for different anomaliesanomalies

Must attack all fusion centers, or Must attack all fusion centers, or know more about fusion center know more about fusion center assignmentsassignments

Still needs to be manually set up, Still needs to be manually set up, routed torouted to

What if things were redundant and What if things were redundant and self-organizing?self-organizing?

What is DHTWhat is DHT

DHT, or Distributed Hash Tables, is a peer-DHT, or Distributed Hash Tables, is a peer-to-peer system where the location of a to-peer system where the location of a resource or file is found by hashing on the resource or file is found by hashing on the keykey

DHTs include CHORD, CAN, PASTRY, and DHTs include CHORD, CAN, PASTRY, and TAPESTRYTAPESTRY

DHT attempts to spread the keyspace DHT attempts to spread the keyspace across as many nodes as possibleacross as many nodes as possible

Different DHT use different topologiesDifferent DHT use different topologies

CANCAN

CAN is based on a multi-reality n-CAN is based on a multi-reality n-dimensional toroid for routing dimensional toroid for routing (Ratnasamy et al)(Ratnasamy et al)

CANCAN

Each reality is a complete toroid, Each reality is a complete toroid, provides full redundancyprovides full redundancy

Network covers entire address space, Network covers entire address space, dynamically splits spacedynamically splits space

Routes across the CAN, so you don’t Routes across the CAN, so you don’t need to connect directly to the need to connect directly to the Fusion CenterFusion Center

GIDS over DHTGIDS over DHT

Fusion centers are organized on a Fusion centers are organized on a distributed hash tabledistributed hash table• Peer-to-peerPeer-to-peer• Self-organizedSelf-organized• DecentralizedDecentralized• ResilientResilient

We use Content Addressable Network We use Content Addressable Network (CAN)(CAN)• Highly redundantHighly redundant• N-dimensional toroid enhances reachabilityN-dimensional toroid enhances reachability

DIDS diagramDIDS diagram

INTERNET

NIDS NIDS

Host IDS

CAN

Peer-to-peer

Infected Machine

Worm Probe Sent

NIDS Reports to Fusion Center

CAN directs toFusion Center

IDS on probed Host reports toFusion Center

Reporting InformationReporting Information

Fusion Centers need enough information Fusion Centers need enough information to make reasonable decisionsto make reasonable decisions

ID systems all have different proprietary ID systems all have different proprietary reporting formatsreporting formats

Fusion Centers would be overloaded with Fusion Centers would be overloaded with data if full packet dumps were sentdata if full packet dumps were sent

We need a concise, standardized format We need a concise, standardized format for reporting anomaliesfor reporting anomalies

Symptom VectorSymptom Vector

Standardized set of information Standardized set of information reported to fusion centers.reported to fusion centers.

Plugins for IDS could be written to Plugins for IDS could be written to handle producing these vectors and handle producing these vectors and actually connect to the CANactually connect to the CAN

Flexibility for reporting more detailsFlexibility for reporting more details

Symptom VectorSymptom Vector

<src_addr,dst_addr,proto,src_port,dst_port,payload,<src_addr,dst_addr,proto,src_port,dst_port,payload,event_type,lower_limit,upper_limit>event_type,lower_limit,upper_limit>

• Payload: Payload specifies some descriptor of the actual Payload: Payload specifies some descriptor of the actual packet payload. This is most useful for worms. Two packet payload. This is most useful for worms. Two choices we’ve considered so far are a hash of the choices we’ve considered so far are a hash of the contents, or the size in bytescontents, or the size in bytes

• Event_type: A code specifying an event type such as a Event_type: A code specifying an event type such as a worm probe or a SYN floodworm probe or a SYN flood

• Based on the event_type, upper_limit and lower_limit are Based on the event_type, upper_limit and lower_limit are two numerical fields available for the reporting IDS to two numerical fields available for the reporting IDS to provide more informationprovide more information

Payload ReportingPayload Reporting Hash: a semi-unique string produced by Hash: a semi-unique string produced by

performing mathematical transformations performing mathematical transformations on the contenton the content• Uniquely identifies the contentUniquely identifies the content• Cannot easily be matched based on “similarity” Cannot easily be matched based on “similarity”

so it’s hard to spot polymorphic wormsso it’s hard to spot polymorphic worms Size: the number of bytes the worm takes Size: the number of bytes the worm takes

upup• Non-unique: two worms could be of the same Non-unique: two worms could be of the same

size, though we’re doing research to see how size, though we’re doing research to see how often that actually occursoften that actually occurs

• Much easier to spot polymorphism: simple Much easier to spot polymorphism: simple changes cause no or only small changes in sizechanges cause no or only small changes in size

Routing InformationRouting Information

DHT is traditionally a peer to peer file DHT is traditionally a peer to peer file sharing networksharing network• Locates content based on name, hash, Locates content based on name, hash,

etcetc• Not traditionally used to locate Not traditionally used to locate

resourcesresources We develop a routing vector in place We develop a routing vector in place

of traditional DHT addressing of traditional DHT addressing methods, and use it to locate the methods, and use it to locate the appropriate fusion center(s)appropriate fusion center(s)

Routing VectorRouting Vector

Based on the anomaly typeBased on the anomaly type Generalized to ensure similar Generalized to ensure similar

anomalies go to the same fusion anomalies go to the same fusion center, while disparate anomalies are center, while disparate anomalies are distributed across the network for distributed across the network for better resource allocationbetter resource allocation

Worm routing vector: Worm routing vector: <dst_port,payload,event_type,lower_<dst_port,payload,event_type,lower_limit,upper_limit>limit,upper_limit>

Routing VectorRouting Vector

Worm routing vector avoids using Worm routing vector avoids using less relevant fields such as source less relevant fields such as source port or IP addressesport or IP addresses

Designed to utilize only information Designed to utilize only information that will be fairly consistent across that will be fairly consistent across any given wormany given worm

Used to locate fusion center, which Used to locate fusion center, which receives full symptom vector for receives full symptom vector for detailed analysisdetailed analysis

Size and the boundary problemSize and the boundary problem Assume a CAN with several nodes. Each is Assume a CAN with several nodes. Each is

allocated a range of sizes, say in blocks of 1000 allocated a range of sizes, say in blocks of 1000 bytes.bytes.

Assume node A has range 4000-5000 and node B Assume node A has range 4000-5000 and node B has range 5000-6000has range 5000-6000

If a polymorphic worm has size ranging between If a polymorphic worm has size ranging between 4980 and 5080, the information is split4980 and 5080, the information is split

Solution? Have information sent across the Solution? Have information sent across the boundary. Node A sends copies of anything with boundary. Node A sends copies of anything with size >4900 to node B and node B sends anything size >4900 to node B and node B sends anything with size <5100 to Awith size <5100 to A

To DHT or not to DHTTo DHT or not to DHT

DHT automatically organizes everything DHT automatically organizes everything for usfor us

DHT ensures anomalies are somewhat DHT ensures anomalies are somewhat spread out across the networkspread out across the network

DHT routes in real time, without DHT routes in real time, without substantial prior knowledge of the substantial prior knowledge of the anomalyanomaly

DHT is redundant, making an attack DHT is redundant, making an attack against the sensor fusion center tricky at against the sensor fusion center tricky at worst and impossible to coordinate at bestworst and impossible to coordinate at best

Simulating the systemSimulating the system

We build a simple array of nodes, We build a simple array of nodes, and have them generate the and have them generate the symptom and routing vectors as they symptom and routing vectors as they encounter anomaliesencounter anomalies

Not yet complete, work in progressNot yet complete, work in progress Demonstrates fusibility of Demonstrates fusibility of

information appropriately; non-information appropriately; non-interference of multiple simultaneous interference of multiple simultaneous anomaliesanomalies

Further WorkFurther Work

Complete paper (duh)Complete paper (duh) Add CAN to simulation to actually Add CAN to simulation to actually

routeroute Include real-world packet dumps in Include real-world packet dumps in

the simulationthe simulation Test on more complex topologies?Test on more complex topologies?

Documents

Global Intrusion Detection Using Distribute Hash Table