HACQIT: H ierarchical A daptive C ontrol of Q oS for I ntrusion T olerance

HACQIT:Hierarchical Adaptive Control of QoS for Intrusion Tolerance

James E. Just

James C. Reynolds

Karl Levitt13 February 2001

Server

GWSwitch

FW

Monitor &

Adapter

To Critical Users VPN

Sensors

Controls

Primary Nodes

Backup Nodes

Decoys/Fishbowls

Server

ServerServer

Server

Server

Outline

• Team• HACQIT idea• Goals• Architecture• Status• Plans• Current Capabilities• Questions and Issues

HACQIT Team• Teknowledge Corporation – architecture, design,

Quorum component modification, monitor/adapter development, integration– J. Just

– J. Reynolds

– L. Clough

– R. Maglich, E. Lawson

• UC Davis – attack modeling, sensing and response options– K. Levitt

– R. Pandey, F. Wu

– J. Rowe

– M. Tylutki

The HACQIT Idea

• Utilize robust hierarchical control of QoS and other fault tolerance techniques to deliver critical COTS services to critical users while under attack– Significantly raise adversary work factors

– Focus on useful military applications

– Policy driven

• Leverage current and new technologies– QoS/Quorum – DeSiDeRaTa, AQuA, QCS, others

– IA&S – wrappers, intrusion and integrity sensors, active monitoring & response, randomization, VPNs, attack modeling (Jigsaw concepts), honeypots

– Fault tolerance – separation, diversity, replication & check-pointing, fail-over

– Others – out-of-band signaling, etc

• Incrementally deliver capabilities

Project Goals

• Prototype HACQIT controlled cluster delivering– 4 hours of intrusion tolerance under

– Active Red Team attacks on hosts

– Providing policy determined critical services from

– COTS/GOTS applications to

– Critical users (also policy determined) at

– 75% capacity

• Extensible model base for Intrusion Tolerance• Focus

– COTS HW & SW based for near term utility

– Architecture based framework for longer term extensibility (hierarchical and fractal)

HACQIT Scope

• Will address– QoS control for critical services to critical users

– Hierarchical, extensible object control model

– Attacks on host availability and integrity

– Variety of COTS/GOTS applications

– Policy specification of above

• Won’t address– Network infrastructure (e.g., denial of service attacks

on bandwidth, routers or LANs) or physical attacks

– Integrity of data sources used as inputs, confidentiality

– Legitimate but false insider manipulation of application

– Developing new sensors or mechanisms, but will leverage them

WAN

User J

i

User POut-of-Band Comms Between HACQIT

M/A’s & Cyber Panel??

Other Enclaves

= Non-Critical Service

Server

= Attacker

User = Critical User

User = Non-Critical User

= Sensors Key

LAN

User 1

FW

User 2User 3

User *User N

qUser o

User M

Server pServer q

HACQIT Protected Node

Noise Generator

HACQIT Protected Enclave

Server r

= Critical Service

HACQIT Reference Architecture

= Controller

Server

GWSwitch

FW

Monitor&

Adapter

Server

HACQIT Node Architecture

GWSwitch

FW

Monitor &

Adapter

Communications with other Controllers

Monitor-adapter uses Out-of-Band signaling for complete

separation from network attacks on LAN and WAN

To Critical UsersOut-of-Band

Control Pathways

VPN

Sensors

Controls

Primary Nodes

Backup Nodes

Decoys/Fishbowls

Server

ServerServer

Server

Server

Internet

Firewall

Corporate LAN

W orkstation

W orkstation

Firewall

Ethernet

Laptop com puter

Laptop com puter

Switch

Primary Backup

Monitor/Adapter

Hub

O ut of Bound ControlNetw ork

Legend

Notional HACQIT Implementation

HACQIT Monitor-Adapter Software Overview

HACQIT Controllers

HACQIT Monitor/Adapter

Mediator Mediator Mediator

Integrity & State Sensors

Intrusion Sensors

Performance Sensors

HACQIT Visualization

IT Event Log, Other Clients

Policies and Specs.

To /From Other M/As

Goals of Control (Increasing Difficulty)

• Continue critical service– Migrate critical applications– Administer system (e.g., add or remove critical

user or critical service)

• Gather more information (e.g., refocus sensors, turn on more intrusive sensing, use decoys or fishbowls)

• Stop current attack– Note: Control over enclave firewall and critical

user protection features needed

• Stop future attacks

Coverage: Illustrative Attack Categories & Characteristics

Attack Category

Service Attacked

Performance Or Integrity

How It Is Launched Impact Of Attack How It Is Detected

How It Can Be Mitigated

Buffer overflow

OS Priv. Program

Integrity Long message to service

Service runs attacker program in privileged mode

Parameter check; Change to return address; Detect wrong priv. program

behavior

Kill process after detection

Race condition

OS Priv. Program

Integrity Exploit interval between check and use

Service runs attacker program in privileged mode

Detect attacker-caused delays in program

Detect wrong privilege program behavior


Overflow a system table by inside process

OS and Applic.

Performance Create excessive number of processes

Denial of service to processes, possibly system crash

Anomaly in table Missed deadline


Overflow a system table from outside

OS and Applic.

Performance Send excessive number of packets

Denial of service to processes, possibly system crash

Anomaly in table Missed deadline

Kill process after detection Block packets

Modify critical applic. program

Applic. Integrity Obtain priv. status and then modify a user file

Critical program or data is modified

Change to critical file Wrong application behavior

Switch to backup and revert to saved file

Others TBD

Paradigm for Responding to Integrity Intrusion

• REPEAT UNTIL ATTACK SYMPTOMS DISAPPEAR

• Detect integrity violation on a critical file• Switchover to backup server; restore prior version

of critical file on primary• Use Jigsaw model to determine possible causes

and sources of attack• Deploy sensors and responders as determined by

model• If attack persists block with responders

Paradigm for Responding to Internal DOS Attack

• REPEAT UNTIL ATTACKSYMPTOMS DISAPPEAR

• Detect denial of service violation on primary server

• Switchover to newly created process on server; kill process causing denial of service

• Use Jigsaw model to determine possible causes and sources of attack

• Deploy sensors and responders on server and on firewall as determined by model

• If attack persists block with responders

HACQIT Actions in Responding to Connection Spoofing Attack

• Detect change to .rhosts file on primary• Switchover to backup• Restore previous version of primary, which is now

the backup• Use Jigsaw model of attacks to identify possible

causes of integrity problem– Change is legitimate by “clean” process-- no integrity

problem

– Change is by an unauthorized process

– Change is by a legitimate rcommand

– Change is by an unauthorized rcommand

HACQIT response to Connection Spoofing (cont)

• HACQIT checks for erroneous processes -- finds none; so conclude change is legitimate or due to an rcommand

• HACQIT starts monitoring for rcommands

• Attack persists, but now on backup with arrival of rcommand

• HACQIT temporarily blocks rcommand until verification

• HACQIT monitoring detects symptoms of connection spoofing attack -- sequence number guessing, DOS on a host

• If traceback to true source s is possible, connections from s are blocked; otherwise, degraded mode (no rcommands)

Example w/ Capabilities

Connection Spoof

Address Forging

ExecuteCommands

Seq # Probe

Packet Spoofing

Synflood

Seq. Number Guess

Prevent Connection Response

RSHActive

Forged Src Address

Spoofed Packet

RSH Connection

SpoofSpoofed

Connection

RemoteLogin

cat + + >> /.rhosts

Remote Execution

Example attack composed of multiple concepts and capabilities

Coverage: Illustrative Application “Types”

What follows is a first cut. Not completely clear what the minimum set of characterization “axes” is to give HACQIT the necessary robustness.

• Human user, client side applications, central storage, e.g., MS Office

• Human user, client-server, e.g., web servers or applications• Human user, client-server-database (three tier), e.g., shared

planning applications, web based applications• Human user, store and forward applications, e.g., email –

sendmail or Exchange• Human user, communication and collaboration

applications, e.g., CVW or Odyssey or Net meeting• Real time, server to server or server to server to … to

server, e.g., radar processing or weapon system controls• Others as necessary

First Round Capabilities Demonstrated

• Manual migration

• Simulation of attacks resulting in:– Soft reboots– Hard reboots

• Simulation of integrity attack (Tripwire)

• Simulation of performance degrading attack (cpuhog)– Detect “runaway” host process– Detect QoS degradation

• Second round capabilities described in Part 2

What We’ve Done & Status

• Second round of “attack & solution space” exploration– Test applications include Dynbench, Apache web server,

Notepad – (Exchange or sendmail in works)– Refining architecture & design -- HACQIT requirements and

component responsibilities being refined– DeSiDeRaTa code conquered (or at least subdued)– Cross project coordination underway

• Attack and response modeling begun• New code developed for:

– Secure Task Manager and heartbeat monitor– Sensor manager, e.g., Tripwire and wrappers– Response managers: e.g., firewall and auditing– Policy driven wrappers– HACQIT monitor/adapter

Plans

• Continue coordination and leveraging activities• Development

– Continue design, rapid experimentation (risk reduction), and research efforts through February

– Write specification for first real prototype in March– Develop solid prototype for June evaluation– Evaluation – Red Team, informal hacker exposure via

internet, other– Go into next cycle

• Leverage– SCC – firewall on NIC (ADF)– Draper – Gateway “cleaner”

• Other activities– IFIP WG 10 Dependability Benchmarking

Current Demonstration Purpose

• Use of policy driven wrapper technology to intercept suspicious calls and initiate failover

• Use of Quality of Service manager to effect switchover

• Use of diversity among primary and backup to reduce likelihood of renewed attack against backup

• Lab capability to test protective measures against actual attacks and mitigate their effects on simulated users

HACQIT Demonstration Configuration

• HACQIT primary NT running Apache and/or Exchange and/or MS Word

• HACQIT backup is Linux

• Outside of HACQIT cluster firewall are three client workstations, Good and Continuing, two “weak” legitimate user clients, and Bad, source of malicious attacks

• Bad could be inside or outside the Enclave that is protected by the second firewall– Bad is inside the enclave firewall for convenience

– Eventually a user will be outside the enclave firewall

• Secure channel to the client– Currently VPN is not used

– Eventually will use VPNs or IPSec.

Hub

Continuous User (control)

Good User (victim)

PETER

MARSHA

JAN

BOBBY

ALICE (SWITCH)

Hub

HACQIT LAB

192.168.250.251

192.168.250.252

192.168.150.155

192.168.200.205

192.168.200.200

192.168.150.150

Bad User (attacker)192.168.250.253

HACQIT Lab

HACQIT Software Implementation

COTS/GOTS

Modified

Original

Legend

FMC

Firewall

Tripwire

Host Control

STM

IM

HCI

Hub

Switch

Program Control

Monitor/Adapter

Primary Backup

Host Control

TCP DUMP

Current Demonstration Scenario

• Users (Good and Continuing) running NT in enclave– Processes on Good and Continuing simulate client demands on

Apache– Other processes will simulate demands on Exchange or saving

Word files on primary

• Automated attack launched from Bad to take over Good and then attack web server on Primary (NT)– Exploit vulnerabilities in Apache or Exchange or Word – Attack executes a program and/or modifies file

• Primary attack detected & mitigated by Apache wrapper– Wrapper communicates with the Monitor/Adaptor on our out-of-

band machine

• Monitor/adaptor starts Apache (under Linux) on the backup and tells firewall controller to switch IP address from the primary to the backup

• Exchange or Word could start using Wine or VMWare technology

New Issues

• Can we capture or redirect client requests so that users really are not interrupted during a migration?

• What does this mean for real time? How does Desi do failover without the DynBench applications losing something? Do they stop processing until the connection is reestablished?

• How can we save state and migrate an application? Do different types or classes of applications have different requirements? What is HACQIT’s ability to cover these different classes?

• Can we add a new user? We might want to disable all sessions and start over with trusted connections

Incremental Implementation Approach (I)

• Several capability levels are envisioned

• Lower levels are specified

• Level 0 – Insider attack and simple migration– No firewall on HACQIT cluster

– Only critical application is Apache web server (no Microsoft Exchange)

– Simulations of user web server activities would be running on both Good and Continuing

– Attack from weak client Good against Apache web server on primary

– Sense compromise via wrapper integrity checker on Apache which then communicates with Monitor/Adapter (M/A)

– M/A migrates Apache web server from NT primary to NT backup

Incremental Implementation Approach (II)

• Level 1 – Outsider attack with simple, cross platform migration and increased sensing– Firewall(s) added to HACQIT cluster

– Added simulations of user web server activities should be added to legitimate user machine outside the enclave

– Bad attacks weak client, Good, to compromise it and set up attack from Good against Apache web server on NT primary

– Same sensing and communication by wrapper as above

– M/A migrates web server to Linux

– M/A turns on increased auditing on firewall/gateway as another response

Incremental Implementation Approach (III)

• Level 2 – Uninterrupted Critical User during Above Attack and Migration– Demonstrate uninterrupted user via change ARP table

– Note: we may still lose the response to the last user request for web services

• Level 3 – Block the Attack– Identify the IP address of the attacker and block

attacker at firewall or router, e.g., add blocking command via OPSEC interface or change a rule to shut out attacker’s access to primary

– At some point we’d like to know if the attack was from a compromised weak client or an insider – probably not at this capability level

Incremental Implementation Approach (IV)

• Level 4 – Multiple Critical Applications– Add mail server (Microsoft exchange or some mail server that is

cross platform)

• Level 5 – Critical Application with State– Save state of critical application and migrate

• Level 6 – Same Machine Failover– Use wrappers to ensure non-compromise of OS and failover

critical application to the same machine (i.e., start up clean application on same Primary and kill attacked process)

• Level 7 – Remediation of Compromised Primary

• Level 8 – Other Types of Diversity & Use of Decoys

• Level 9 -- Randomization of Responses

• Note: Levels 4-9 are relatively independent and can be done in parallel or in a different order

Architectural Explorations

• Major focus of HACQIT is to develop Intrusion Tolerant – oops! – I mean Organically Assured and Survivable architecture

• Our levels of capability demonstrations suggest the Subsumption architecture (Brooks, 86)

• Brooks developed the architecture for his famous robot projects but many of our requirements are the same– Need certain amount of “stupid” reactive behavior– Need guaranteed fast response

• Brooks implemented each layer in his architecture as a deterministic finite state machine with simple I/O

• No world model is depended on• Communication from higher to lower levels is done

through suppression and injection

Traditional Architecture

Subsumption Architecture

HACQIT Mapping to Subsumption Architecture from Levels 0-2 Capabilities

• Pre-Level 0 capability features could be lowest layer like Brooks’ “Avoid” module– Unauthorized process on primary boosts CPU utilization above

threshold: kill process, move critical service to backup– Unauthorized modification of file: move critical service to backup

• Level 0 capabilities would be second layer– Wrapper intercepts suspicious call: move critical service to backup– Diversity advantage: Backup runs different OS than primary– TCPDump is turned on after suspicious call is intercepted

(heightened awareness)

• Level 1 would be third layer– Migration is effected without interrupting critical users (change

ARP table)

• Level 2 would be fourth layer– Source address of attack is identified– Address blocked by change to firewall policy

Higher Levels of Capability

• Multiple critical applications• Failover which saves state• Failover on the same machine• Forensics• All may be too complex, long-lived, or require

global information in order to implement• We’re looking at DICAM as architecture for these

functions

DICAM Control

Technology Transfer Exposure Opportunity

• NSWC (Mike Masters) is technology transfer target for Quorum– Annual demonstration in September– Security and intrusion tolerance are of interest– Willing to discuss inclusion of HACQIT in

demonstration – leverages Quorum technologies

– Need to start coordination planning in April

Issues

• Looking for interested potential users for feedback (PACOM, NSWC, other?)

• Need help in getting ACOA server software

• Reuse of research prototypes

Thank you.

Questions?

Backup 1:Towards a Formal Methodology for Responding to Integrity and DOS

IntrusionsJim Just - Teknowledge

Karl Levit, Jeff Rowe, Marcus Tylutki,

Nicole Carlson, Steven Templeton,

Mark Heckman -- UCD

Paradigm for Responding to Integrity Intrusion

REPEAT UNTIL ATTACK SYMPTOMS DISAPPEAR

Detect integrity violation on a critical file

Switchover to backup server; restore prior version of critical file on primary

Use Jigsaw model to determine possible causes and sources of attack

Deploy sensors and responders as determined by model

If attack persists block with responders

Paradigm for Responding to Internal DOS Attack

REPEAT UNTIL ATTACKSYMPTOMS DISAPPEAR

Detect denial of service violation on primary server

Switchover to newly created process on server; kill process causing denial of service

Use Jigsaw model to determine possible causes and sources of attack

Deploy sensors and responders on server and on firewall as determined by model

If attack persists block with responders

Connection Spoofing Attack

• Multiple stage

• Attacker establishes a TCP connection to a host (server) H exploiting a trust relationshiop (through .rhosts) between H and some other host H1.

• Attack involves– denial of service on H1

– Connection number guessing

– Planting a trojan horse on H

• Many variants are possible

• Detection is assumed to occur when .rhosts file on H is erroneously modified

Scenario Attacks: an example

kafka sarte

spock

RSH trust relation: sarte trusts kafka, will execute programs for kafka


(1)Spock launches synflood attack against kafka

kafka sarte

spock


kafka sarte

(2)Spock probes sarte for starting sequence number on RSH port

spock


kafka sarte

spock

(3) Spock sends syn packet to TCP/RSH on sarte w/ source forged to be kafka.


kafka sarte

(4) Sarte sends syn/ack to kafka

spock


kafka sarte

(5)Kafka drops packet due to DoS

spock


kafka sarte

(6) Spock sends forged ack packet to sarte, w/ guessed sequence number.Data in packet,“cat + + >> /.rhosts”adds “all hosts” to sarte’s .rhosts file.

spock


kafka sarte

(7) the attacker rsh’s into sarte as root and installs a sniffer to collect passwords.

spock


kafka sarte

(8) Using one of these he telnets into kafka.

spock


kafka sarte

(9) Once on kafka, the attacker exploits a buffer overflow in amd to gain root privileges.

spock

(10)Attacker then, copies credit card number file back to spock.

HACQIT Actions in Responding to Connection Spoofing Attack

• Detect change to .rhosts file on primary

• Switchover to backup

• Restore previous version of primary, which is now the backup

• Use Jigsaw model of attacks to identify possible causes of integrity problem– Change is legitimate by “clean” process-- no integrity problem

– Change is by an unauthorized process

– Change is by a legitimate rcommand

– Change is by an unauthorized rcommand

HACQIT response to Connection Spoofing (cont)

• HACQIT checks for erroneous processes -- finds none; so conclude change is legitimate or due to an rcommand

• HACQIT starts monitoring for rcommands

• Attack persists, but now on backup with arrival of rcommand

• HACQIT temporarily blocks rcommand until verification

• HACQIT monitoring detects symptoms of connection spoofing attack -- sequence number guessing, DOS on a host

• If traceback to true source s is possible, connections from s are blocked; otherwise, degraded mode (no rcommands)

Example w/ Capabilities

Connection Spoof

Address Forging

ExecuteCommands

Seq # Probe

Packet Spoofing

Synflood

Seq. Number Guess

Prevent Connection Response

RSHActive

Forged Src Address

Spoofed Packet

RSH Connection

SpoofSpoofed

Connection

RemoteLogin

cat + + >> /.rhosts

Remote Execution

Example attack composed of multiple concepts and capbilities

NFS Mount Attack-- Overview

• Certain partitions (directories) of an NFS system running on server H are exported.

• An attacker on host Ha performs information gathering commands remotely on H to identify exported partitions and their owners, e.g. user U.

• Once having this information, attacker creates an account for U on Ha.

• The last step is the erroneous account U mounting an exportable partition.

NFS Mount Attack -- as attack specification

1. rcpinfo –p Target-IP Attacker learns that host H uses NFS daemon and that host H uses an

NFS daemon. The preconditions specify that attacker A has a remote network access to target host H and that host H has IP address Target-IP.

2. Showmount –e Target-IP Attacker learns that host H exports hard disk partition P via NFS.

Preconditions deal with IP addresses and exported services not changing from step 1.

3. showmount –a Target-IP Attacker learns that partition P is locally mounted by H; the

preconditions of the previous steps are unchanged.

NFS Mount Attack (cont.)

4. finger @Target -IP

Attacker learns that user U is currently connected to H and that the ID for user U is Userid. Among the preconditions is that host H provides the finger service.

5. create-account(U, Userid)

The precondition assures that the attacker has an account on some host Ha. After this step, there is an account for U on Ha. Note there are alternatives to this step, such as modifying the password file.

6. mount –t Target-Partiion /mnt

The attacker can now access the directory of U. The preconditions are that A is connected to Ha and that U is the owner of some directory in the exported partition P.

HACQIT response to NFS Mount Attack

• HACQIT detects modification to a critical file• Switchover to backup server; restore file on old primary

which becomes backup• Through model of NFS determine possible causes of

modifiation:– By a legitimate user– By an legitimate user, but spoofed– …

• HACQIT increases monitoring for NFS• Detects a “write” to critical file• Correlates “write” from a user u with information

gathering on the server and for user u

HACQIT response to NFS attack (cont)

• Temporarily, HACQIT operates in degraded mode, disallowing “writes” from unauthenticated users

• Through Jigsaw model of NFS attacks and NFS vulnerability analysis

HACQIT determines “mount” export problem and corrects configuration

Denial of Service Attack

• Compromised client launches a “synflood” attack on sever

• Temporarily, HACQIT blocks all packets from client

• HACQIT identifies possible responses to flooding attack– Block packets at firewall from client

– Kill half-open connections as they appear

– …

• HACQIT chooses 1st response, as it is quickly deployed

• HACQIT identifies user and processes on client responsible for attack, and disables them

Backup 2: Selected HACQIT July PI Slides

HACQIT Schedule

1.3 Integrity Sensor/Maintenance

1.7 Experimentation

1.6 Integration

1.5 Component Enhancement

1.2 System Design

1.1 Program Coordination

1.6.1 Component Integration1.6.2 System Integration

1.5.3 Instrumented Connectors

1.5.2 Fault Tolerance Subsystem

1.2.1 Architecture1.2.2 Design

1.5.1 Migration Subsystem

1.5.5 Sample Application

1.3.2 Wrappers1.3.1 Lightweight Integrity Sensors

1.2.3 Interface Specification

1.4.1 Host Level

1.4.2 Cluster Level

1.5.4 QCS Extensions

Specification Extension

Testing

Integrity Measures

CDL Extensions

Location TransparencyIntegrity Measures

Testing

Sensing

TestingControl Interfaces

Human Oriented Client Server

NRT Distributed Application

Options

2.0 New ITS Technology Integration

3.0 Auto-Generation of Integrity Comp

4.0 Diagnosis and Recovery Extensions

Year 1 Year 2 Year 3 Year 4

1.4 Adpative Response Development

Milestones

• Year 1– Applications

• Office• Email• Collaboration• Intranet web server

– Control• Specification based performance and integrity• Replication and switchover

• Year 2– Applications

• Simple planning application• Network-based military planning• Real time application

– Control• Detected intrusions• Limited restoration

• Options– Integration of new ITS technologies– Automatic generation of integrity monitors– Extensions for diagnosis and recovery

Note that these milestones are more aggressive than the

official SOW. Depending on the results of detailed design

effort, some adjustments may be necessary

Technology Transfer

• Who needs intrusion tolerant server capabilities for critical users and services – user pull– Government -- military and civilian– Commercial -- large corporations, ISPs and others who offer out-

sourced application services

• Development and maintenance organizations for above– Government development efforts (e.g., IO COP, GCCS)– Government ACTDs (e.g., AIDE, ACOA, CINC 21?)– Commercial security product/service providers (including

Teknowledge?) -- significant commercialization costs

• Mechanisms– Demonstrations– Ongoing communications– Publications– Code availability

Risks and Mitigations• Attacks

– Against monitoring & control components

– Common mode attacks– Active blocking after

unknown attack

• Accurate and rapid intrusion sensing & avoidance

• Backups – Restoration speed

(applications & connections)– Corruption (logical isolation

from primary)

• Overhead for different types of applications

• Recovery of the primary server in a timely manner (not a major focus of base program)

• Workload to use & maintain

• Diversity (hw, sw, versions, time, etc)

• Restrictions on services• Out of band control system• VPNs among critical users

and servers• Wrappers for low level

sensing and control• Adaptive control• Design for usability• Randomization of

initialization & response• Content monitoring (and

selective logging) of input/output streams

• Deception (decoys, honey-pots, fishbowls, etc)

JIGSAW Concept Template Extended

[abstract] concept <concept_name> [extends concept_name]requires

[sensor|capability|config] <CapabilityTypes:LABEL_LIST>+with

<expression list>end;

provides<CapabilityTypes:LABEL_LIST>+

with<assignments>*

end;

action<external actions>*[reportable <when|unless> <condition>]

end;

response<external response class required to stop “attack” now>*

end;

end.

Fault vs. Intrusion Tolerance

• Fault tolerance: Fault => Error => Failure– Goal: system avoids failure despite component faults

– Process: Error detection => Damage Confinement => Error Recovery => Fault Treatment & Continuation

– Building Blocks: Byzantine agreements, synchronized clocks, stable storage, fai stop processors, etc

• Intrusion tolerance: Attack => Error => Failure– Goal: system avoids failure despite attacks (errors

include loss of integrity)

Design Considerations

• COTS/GOTS hardware and software

• Distributed hierarchical control paradigm for enclave and wide area protection

• Separation is key requirement

• Anything not explicitly permitted is forbidden

• Intrusion resistance to support intrusion tolerance

• System boundary includes some protection for and control of weak clients

• Keep footprint small -- more active control than redundancy

• Focus on known vulnerable areas, e.g., weak client attacks (a la recent attack against Microsoft)

• Adaptive responses are key research area

General Use Case

• Development & setup: policies, application specs., etc• Operations: Assume a backup (hot or cold)• Detect a performance problem in critical application

– Switchover to backup, increase auditing and sensing levels, determine if cause is an attack, then expunge attack from the primary and block future occurrences of the attack, return

• Detect an integrity problem in critical application (including data files), operating system, or other critical process that indicates an undetected intrusion– Switchover to backup, expunge the attack from the primary, block

future occurrences of the attack, return

• Detect an intrusion:– If intrusion does not constitute a threat to the critical application, then

start a procedure to expunge the attack, if necessary, and block future occurrences, return;

– If attack threatens the critical application, then switchover to backup, expunge the attack from the primary, block future occurrences of the attack, return;

Backup 3: Simple Capability Demonstrations

Experimental Configuration (in Yellow/Blue)

FW

Server 2Server 2

Server 2Server 2

Monitor &

Adapter

Sensors

HACQITManagedCluster

Server 2Server 2

Server 2Server 1

Primaries Backups & Decoys

Communications with other HACQIT Controllers

WAN

User User 2

FW

LAN

User 2User

2User 2User

N

User J

i

qo

User M

Server r

User P

Server rServer r

VPNs

Primary & Backup

•Secure Task Monitor

•Desi sensor/controller

•Dynbench

•Apache web server

•Notepad (Word)

•Tripwire

Monitor/Adapter•Desi middleware•Integrity monitor•HCI (remoted)

Firewall•Controller•TCPDump

Desiderata Software Architecture

QoS/RM

Services

Scenario File

Assessment Metrics File

Command File

Spec File

Meta Spec

Program Control

SpecGenerator

ExperimentGenerator

QoS Monitor

QoS Collector

DynBench Sub-System Startup

Daemon

HCI

Doctrine File

Name Server

LoadSimUI

LoadSim

User

Startup Daemon HardwareMonitors

Program Control Hardware Broker

HardwareMonitors

Startup DaemonStartup DaemonStartup Daemon

QOS Manager

Resource Manager Hardware Analyzer

HOSTS

DynBenchBenchmark

System Broker

QoSSPEC FILES

HCI

Name Server

All Programsin the sytemexchange info.With the Name Server

Path Information

Host-ProgramAllocation,Program Actions

Host Information (e.g. Host load)

QoS Spec

QoS Spec

QoS Spec

Program Allocations,Program Actions

Path Load Information,Violations, Latencies

Host LoadInformation

Program Commands

Shell Commands Host Data

Host Information

Path, Programlatencies

Timestamps

Path Information

HardwareMonitors

HardwareMonitors

Desiderata Software Control Flow

SWITCH

UTA Network

Nujersy ( SUN sparc 5)Viper ( PC; WIN NT; Pentium)

Desidrta ( SUN Ultra)Texas ( SUN Ultra)

Virginia ( SUN sparc 5)

Stealth ( PC; WIN NT; Pentium)

Mustang ( PC; WIN NT; Pentium)

PC

NS

QM

SB HA

HB

RM

HM SD

HCI

SD

SD

SD

HM

HM

HM

NS

SB

HB

Name Server

System Broker

Host Broker

QM

SD

HM

PC

RMHA

Startup Daemon

Host Monitor

Host Analyser Resource Manager

QoS Manager

Program Control

MIDDLEWARE - TESTBED HARDWARE MAPPING

Radarconsole

HCI Human -Computer Interface

Mapping of Desiderata Middleware to Distributed Hardware

Desiderata’s Real-time Path Paradigmse

nsor

s

actu

ator

s

engagement

situation assessevent

eventmonitor & guide

Dynbench Suite of Real-time Paths

Sensor FM SensorFilter EDM

SensorED

AM

SensorAction

MGMSensorMG

Radar Display

Actuator

Scenario file Doctrine file

Doctrine file

EG

Command fileUser

Situation Assess

Engagement

Monitor&Guide

Dynbench Subsystems (Paths)

• Situation assessment– Filter Manager (FM): receives radar tracks from the

sensor and divides them among the filter programs

– Filter: correlates point data of track into equations of motion of the track body

– Evaluate and Decide Manager (EDM): distributes workload among ED programs

– Evaluate and Decide (ED): determines if current position of radar track is within critical region

Dynbench Subsystems (Paths)

• Engagement– Action Manager (AM): receives threat tracks from ED and divides

them among the action programs

– Action: receives threat tracks from AM and commands actuator

– Actuator: receives action and executes

• Monitor and Guide– Monitor and Guide Manager (MGM): receives threat tracks and

interceptors from ED and divides them among the MG programs

– Monitor and Guide (MG): receives threat tracks along with interceptors, updates position of interceptor according to the position of the threat track

HACQIT Prototype Architecture

HM SD HM SD

FWC

TCPDump

NS

QM

SB

HB

HM SD

STM Tripwire

IM

Hub

Documents

HACQIT: H ierarchical A daptive C ontrol of Q oS for I ntrusion T olerance