Outline

Automatic Trust Managementfor

Adaptive Survivable Systems

Howard Shrobe MIT AI Lab

March 2002 PI Meeting Hilton Head

Outline

• Overall Framework

• Review of Diagnostic Process

• Review of Computational Vulnerability Analysis

• Rational Choice and Utility Functions

• Self-Checking and Certified Computation

Motivating Example

GrammarCenter

SpeechProcessing

Grammar

VoiceCapture

text

Start

Omnibase

query

DisplayGeneratorresponse

Display

Sleepy Grumpy DocDopey

utterance

Performanceexpectations

IntegrityConstraint

GuiDirectives

For example: “Show me Blue Platoon’s maneuvers leading up to Phase Line Orange”. Should provide a correct map of Blue Platoon’s actions in a reasonable amount of time. Integrity constraints check that the data is valid.

What We Expect and What We Do If Not

• When the command is issued:– We expect that the results will be generated within reasonable time– We expect that intermediate data will pass integrity constraints

built into the system• If these conditions do not obtain:

– We diagnose what went wrong– We obtain updated estimates of how likely it is that each resource

has been compromised– We pick a reasonable place from which to restart the computation

(if the integrity constraints failed).– We pick a new way to do the task and/or a new allocation of

resources to the components of the task that maximizes the likelihood of success

• What this requires:– Self Checking– Diagnosis– Other monitoring of health status– Decision Theoretic Choice (and therefore Utility functions).

Adaptive Survivable Systems

• Techniques that enable self-monitoring and diagnosis – Driven by representations of structure and purpose– The application knows the purposes of its components– The application checks that these are achieved– If these purposes are not achieved, the application localizes and

characterize the failure

• Techniques that enable application adaptation – The application achieve its purpose as well as possible within the

available infrastructure by choosing alternatives.– Driven by models of Trust (informed by diagnosis and monitoring)– Driven by models of computational alternatives– It must have more than one way to effect each critical computation– It should choose an alternative approach if the first one failed– It should make its initial choices in light of the trust model

The Active Trust Management Architecture

Self Adaptive Survivable Systems

PerpetualAnalytical Monitoring

Trust Model:TrustworthinessCompromises

Attacks

Rational Decision Making

Other InformationSources:

Intrusion Detectors

TrendTemplates System Models

&Domain Architecture

Rational Resource Allocation

Diagnosis

Diagnosis as Likely Mode Identification Multi-Mode Multi-Tiered Diagnosis

• We model each component as having multiple modes (normal and abnormal)

• Another level of detail shows the dependence of computations on underlying resources

• Each resource has models of its state of compromise• The modes of the resource models are linked to the modes of the

computational models by conditional probabilities• The model forms a bayesian network

Normal: Delay: 2,4

Delayed: Delay 4,+inf

Accelerated: Delay -inf,2

Node17

Located On

Normal: Probability 90%

Parasite: Probability 9%

Other: Probability 1%

Component 1

Has models Has models

Conditional probability = .2



A

Host1

B

D

C

E

Host2 Host4Host3

N HNormal .6 .15Peak .1 .80Off Peak .3 .05

N HNormal .8 .3Slow .2 .7

Normal .9Hacked .1

Normal .85Hacked .15

Normal .8Hacked .2

Normal .7Hacked .3

N HNormal .50 .05Fast .25 .45Slow .25 .50

N HNormal .60 .05Slow .25 .45Slower .15 .50


An Example System Description

The System Description includes a Bayesian Network

• The Model can be viewed as a Two-Tiered Bayesian Network– Resources with modes– Computations with modes– Conditional probabilities linking the modes

A

Host1

B

D

C

E

Host2 Host4Host3



Normal .9Hacked .1

Normal .85Hacked .15

Normal .8Hacked .2

Normal .7Hacked .3




The system description includes a behavioral model

• The Model can also be viewed as a behavioral model with multiple modes per device– Each model has behavioral description

• The modes have posterior probabilities linked by conditional probabilities to the probabilities of the modes of the resources

A B

D

C

E






Integrating model based and Bayesian reasoning

• Start with each behavioral model in the “normal” state • Repeat: Check for Consistency of the current model• If inconsistent,

– Add a new node to the Bayesian network• This node represents the logical-and of the nodes in the conflict.• It’s truth-value is pinned at FALSE.

– Prune out all possible solutions which are a super-set of the conflict set. – Pick another set of models from the remaining solutions

• If consistent, add to the set of possible diagnoses• Continue until all inconsistent sets of models are found• Solve the Bayesian network

Conflict:A = NORMALB = NORMALC = NORMAL

Discrepancy Observed Here

Least Likely Member of ConflictMost Likely Alternative is SLOW

A B

D

C

E






Adding Attack Models• An Attack Model specifies the set of attacks that are

believed to be possible in the environment• Each resource has a set of vulnerabilities

– Vulnerabilities enable attacks on that resource• A successful attack exploits the vulnerability, putting the

resource into a non-normal behavioral mode• This is given as a set of conditional probabilities

– If the attack succeeded on a resource of this type then the likelihood that the resource is in mode-x is P

– This now forms a three tiered Bayesian network

Host1 Buffer-Overflow

Has-vulerability

Overflow-AttackEnables

Unix-Family

Resource-type

CausesNormal

Slow

.5

.7

Three Tiered Model

What the diagnostic process tells us

• All non-conflicting combination of models are possible diagnoses

• The posterior probabilities tell you how likely each diagnosis is.

• This guides recovery processing

• Each mode of each resource has a posterior probability• This guides resource selection in the future

• The attack models couple the resource models, given a system wide view.

• This informs the trust model• This couples to long-term monitoring, that looks for

complex multi-stage attacks

Attack Modeling and Other Monitoring

Computational Vulnerability Analysis

• Grounding the attack model in systematic analysis

• Ontology of:– System Properties– System Types– System Structure– Control and Dependencies

Generating Attack ModelsThrough Vulnerability Analysis

• The problem: Where does the attack model and its links to behavioral modes come from?– So far, by hand crafting

• Vulnerability Analysis supplants this by a systematic analysis:– Forming an ontology of how computer systems are

structured– Building models of the environment

• Network topology: nodes, routers, switches, filter, firewalls• System types: hardware, operating systems• Server and user suites: Which servers and users run where

– Analyzing how properties depend on resources– Analyzing the vulnerabilities of the resources

Modeling System Structure

Hardware

Processor

Memory DeviceControllers

Devicescontrols

Part-of

OperatingSystem

LogonController

Scheduler

DeviceDrivers

Part-of

JobAdmitter

Resides-In

controls

UserSet

WorkLoad

FileSystem

AccessController

resources

controls

files

Part-of

Input-to

Input-to

controls

SchedulerPolicy

Modeling the topologyMachine name: sleepyOS Type: Windows-NTServer Suite: IIS…..User Authentication Pool: Dwarfs…

Router: Enclave restrictions. ….

Topology tells you:who can share (and sniff) which packetswho can affect what types of connections to whom

Switch: subnet restrictions. ….

Switch: subnet restrictions. ….

Netchex

The AI Lab Topology (partial)

Router Netchex Filters out Telnet.

ServerSwitch

8th-Floor-1

8th-Floor-2

7th-Floor-1

RouterAccesspool

Life

Kenmore

Maytag

Server Access Pool

Doc

Dopey

Sleepy

DwarfAccess Pool

Sneezy

Sakharov

Truman

Quincy-Adams

LispAccess Pool

Jefferson

Wilson

CreepyCrawler

GeneralAccess Pool

Modeling Dependencies

• Start with the desirable properties of systems:– Reliable performance

– Privacy of communications

– Integrity and/or privacy of data

• Analyze which system components impact those properties– Performance - scheduler

– Privacy - access-controller

• To affect a desirable property control a component that contributes to the delivery of that property

Controlling components (1)

• One way to gain control of a component is to directly exploit a known vulnerability– One way to control a Microsoft IIS web server is to use

a buffer overflow attack on it.

IIS Web Server Process

Buffer-Overflow Attack

Takes control of

IIS Web Server

Buffer-Overflow Attack

Is vulnerable to

Obtaining Access (1)

• One way to gain access to an operation on an object is to find a process with an adequate capability and take control of the process

Typical User File

User Read

Required forRead

Typical User File

To Read

Control-action

Typical UserProcess

Typical User Process

User Read

PossesesCapability

An Example• Affecting reliable performance:

– Control the scheduler - • The scheduler is a component that impacts performance

– By modifying the scheduler’s policy parameters• The policy parameters are inputs to the scheduler

– By gaining root access• The policy parameters require root access for writing

– By using a buffer overflow attack on the web-server• The web-server process possesses root capabilities• The web-server process is vulnerable to a buffer-overflow

attack.

• For this attack to impact the performance all the actions must succeed– Each has an a priori probability based on its inherent

difficulty and current evidence suggesting that it occurred.

Affecting Data Privacy (2)

Using Attack Scenarios

• This information is captured in an Object-Oriented knowledge representation and rule-base system that reasons with it.

• The inference process develops multi-stage attack scenarios

• The scenarios are transformed into Trend Templates for recognition purposes

• The scenarios are transformed into Bayesian network fragment for diagnostic purposes

• The Bayesian network fragments in both cases update the posterior probabilities of the Trust Model

Trust Model:TrustworthinessCompromises

Attacks

Attack Models and Monitoring

Rational Choice and Utility Functions

How do we use the trust model to select resources?

Jon Doyle & Mike McGreachie

Making Choices and Utility Functions• Utility functions are used to assign a numerical

value to a particular way of doing a task.• Utility functions are not a natural way for people

to express themselves• What people can state easily is their preferences• In particular they can compare some combinations

of variables to others, every else being equal• A typical set of preference statements:

– I prefer convenience of use to high security if I’m not under attack.

– I prefer high security to convenience if I’m under attack• The trick is to convert a set of such “ceteris

parabus” preferences into a numerical utility function.

How to build a Utility Function, Step 1

• Convert preferences into an intermediate form using bit-vectors of boolean variables:Suppose we have f2 & -f4 > f3 (f1 being equal)

This expresses preferences for bit vectors consistent with f2 & -f4 & -f3 (bit-vector *100) over those satisfying

-(f2 & -f4) & f3 = (-f2 & f3) or (f4 & f3)

We extend each disjunct for both value of the missing variable: (-f2 & f3 & f4), (-f2 & f3 & -f4)

(-f2 & f3 & f4), (f2 & f3 & f4) giving three bit vectors: *111, *011, *010.

Form three rules: *100 > *111, *100 > *011, *100 > *010

Building Utility Functions, Step 2

• Define a Directed-Graph representing the rules:Nodes in the graph are complete bit-vectors over all variables (grows exponentially!)

Edges connect any two nodes indicated by the intermediate form of the preference rules. Source nodes are preferred to sink nodes.

A node M is preferred to N if and only if there is a path in the graph from M to N

Four simple utility functions are consistent with the rules:

Minimizing longest outgoing path length

Descendant number of descendant

Maximizing longest incoming path length

Topological rank in topological sort order

Building Utility Function: Step 3

• Decompose into smaller domains:• Utility Independence: when the utility of some

features take on values independent of the utility of other features.i.e. if m1 and m2 use only features from Si and m1 > m2 when we add some features from Sj to boththen m1 > m2 when we add some other features from Sj to both.

• Partition the features into subsets so that each subset is Utility Independent of its complement–Start with singleton sets, then merge together two sets that are not Utility Independent.

• Rules tell us when not UI–For example if I prefer convenience to security when not under attack but vice-versa if under attack, then the set {convenience, security} is dependent on “under attack”

Building Utility Function, Step 4

• The total utility function is a linear weighted sum of sub-utility functions one for each subset in the partition (Keeney & Raifa, 1976).

• Need to generate the sub-utility functions and the scaling parameters.

• Restrict the preference rules to each subset and create the preference graph for that subset.

• However, when rules are restricted to a subset they may generate cycles even though they are globally consistent. – *01* > *10* and **01 > **10 is consistent but when

restricted to feature 3 get both 1 > 0 and 0 > 1 !

• So remove rules from each set to break the cycles

Building Utility Functions, Step 5• Choose the rules to remove by solving a big SAT

problem with 2 types of terms:– First type of term says that a preference rule must be

consistent with at least one sub-utility function– Second type says that at least one rule in each

conflicting set must be removed• Use the solution to remove a rule from each

conflict, then construct the subset’s graph without that rule.

• Use the minimizing utility function for this subset.• Set the scaling factors for the sub-utility function

by solving a set of linear inequalities. For each rule with left side L and right side R add the constraint:

Sum(i) ti ui(L) > Sum(i) ti ui(R) for sub-domains i that intersect r.

Self-Checking and Certified Computation

How does a computation detect that the wrong thing happened?

How does it prove that the right thing happened?

Konstantine Arkoudas

Plans and Computations

• Associated with every computation is a “Plan”• The plan is an abstract description, providing:

– Decomposition into components– Data and control flow relationships between these– Pre, Post and Maintain conditions for each component– Dependency links saying how the pre-conditions of

each component and the main-goals of the computation are satisfied by the post-conditions of other components.

• We want to check that these conditions are in fact satisfied as the computation proceeds.– If they’re not satisfied we should stop and diagnose– If the computation completes, we would like a proof

that the right thing happened.

DPL’s: Denotation Proof Languages• A combined calculus of deductions and computations• Deductions are evaluated with respect to an

Assumption Base which is a dynamically maintained piece of state.

• The value of a Deduction is a proposition that is true in the Assumption Base.

• The Primitive Deductions are the basic rules of a standard natural deduction system.

• In addition, there is (assume p D) which temporarily adds p to the assumption base then evaluates D and returns the result then restores the assumption base.

Duality in DPLs

• There is an operator phi, dual to lambda, for building deductive methods. Lambda builds functions, phi builds methods.

• Expressions apply functions to arguments, Deduction !apply methods to arguments.

• Both methods and functions can be higher-order and recursive.

• Deductions and evaluations are distinguished on syntactic grounds, there is no need for a static type system to keep them apart (although the system could have a type system).

• DPL’s are to scheme as HOL is ML

(define-method uncurry (premise) (dmatch premise ((-> ?P1 (-> ?P2 ?P3)) (Assume (and ?P1 ?P2) (?P1 by !(left-and (and ?P1 ?P2))) ((-> ?P2 ?P3) by !(modus-ponens premise ?P1)) (?P2 by !(right-and (and ?P1 ?P2))) (?P3 by !(modus-ponens (->?P2 ?P3) ?P2))))))

(define-method dn* (premise) (dmatch premise ((not (not ?P)) (?P by !(dn premise)) (dn* ?P)) (t !(claim premise))))

Combining the Plan and the Computation

• DPL’s can freely intermix computation and deduction

• We can adopt a style in which each computational step is annotated:– Before we begin we assert the pre-conditions of the overall

computation (after checking that they hold).

– Before each step we add the deduction that the pre-condition is satisfied for the reasons specified in the plan

– After each step we add the claim that the post-conditions are satisfied (after checking that they are).

– At the end of the computation we add the deduction that the overall post-conditions are satisfied for the reasons stated in the plan

• Each proof step is done only for the specific arguments in this particular computation (not a general proof).– But the computation will error if the proof doesn’t hold in this

case.

Certificates• There is a much simpler logic contained within this

calculus– It has just the primitive methods and sequencing– Each application of a method to arguments with respect to the

assumption base either returns a proposition that is implied by the assumption base or it errors.

– A proof is a sequence of such applications– The proof checker is just the interpreter for this simple calculus.

This is a very small TCB.

• The result of every deduction is the full calculus can be justified by a proof expressed in this simpler logic– An extended interpreter captures and bundles up the primitive

methods used in an extended language deduction.– Alternatively, use a Truth Maintenance System.

• This proof in the simpler logic is a certificate that can be delivered with the results of the computation. – The certificate can be mechanically checked by a very simple

interpreter.

Comparison to PCC

• There is an obvious similarity to Proof Carrying Code– Both deliver a computation bundled with a proof

• In Certified Computation:– The Proof is generated on the fly– The Proof applies only to the specific results computed– The Proof discovery is guided by the computation and

is free of search.– The computation errors before completing if the proof

wouldn’t hold– Formalism doesn’t depend on type system

• Idea was used in Credible Compiler in which optimizers justify the optimization using on the fly certificates.

Summary

• DPL’s are used to check computations, justify them when they work, signal conditions when things go wrong.

• Diagnosis figures out what might be wrong, updates the trust model.

• Vulnerability analysis produces attack models for diagnosis and trend templates for long-term monitoring.

• Long-term monitoring collates alerts, updates the trust model

• Rational choice chooses the best way to do the task (or retry the task) given the current trust model.

• Utility functions can be generated automatically from preferences to support rational choice.

Documents

Outline