30
Computational Computational Resiliency Resiliency Steve J. Chapin, Susan Older Steve J. Chapin, Susan Older Center for Systems Assurance Center for Systems Assurance Syracuse University Syracuse University Gregg Irvin Gregg Irvin Mobium Enterprises Mobium Enterprises 24 July 2001 Not for Public Rele

Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Embed Size (px)

Citation preview

Page 1: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Computational ResiliencyResiliency

Steve J. Chapin, Susan OlderSteve J. Chapin, Susan Older

Center for Systems AssuranceCenter for Systems Assurance

Syracuse UniversitySyracuse University

Gregg IrvinGregg Irvin

Mobium EnterprisesMobium Enterprises

24 July 2001 Not for Public Release

Page 2: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

Recap: What isRecap: What isComputational Computational Resiliency?Resiliency?

The ability to sustain application operation The ability to sustain application operation and dynamically restore the level and dynamically restore the level

of assurance during an attack.of assurance during an attack.

Application-centric self defense, builtApplication-centric self defense, builton replication, migration, functionalityon replication, migration, functionality

mutation, and camouflage.mutation, and camouflage.

Page 3: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational ResiliencyComputational Resiliency

Mission CriticalApplication

Attack

Degraded Application sufficiently Improved by

Resiliency to perform Mission Critical Function

Techniques applied to correct situation

ComputationalResiliency

Result ofAttack

Degraded Application trying to perform Mission Critical

Function

Page 4: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

Multi-Faceted ApproachMulti-Faceted Approach

Theoretical frameworkTheoretical framework reason about conformance to policyreason about conformance to policy

Computational resiliency libraryComputational resiliency library dynamic application managementdynamic application management

System software support System software support scheduling/policy frameworksscheduling/policy frameworks

Page 5: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

Computational Computational Resiliency LibraryResiliency Library Dynamic multithreadingDynamic multithreading MigrationMigration ReplicationReplication CamouflageCamouflage Functionality reconfigurationFunctionality reconfiguration Policy-based managementPolicy-based management

Page 6: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Example of CRLibExample of CRLib

16 2x Pentium

16 2x Pentium

16 2x Pentium

16 Alpha

Firewall

Intel 8x SMP

Intel 8x SMP

SGI Origin

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

"The Net"

“Safe Zone”OASIS protection

“The Wild”limited protection

Page 7: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

The Benign StateThe Benign State

16 2x Pentium

16 2x Pentium

16 2x Pentium

16 Alpha

Firewall

Intel 8x SMP

Intel 8x SMP

SGI Origin

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

"The Net"

Dudley’s job(low priority)

Bullwinkle’s jobRocky’s job

Page 8: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

The AttacksThe Attacks

16 2x Pentium

16 2x Pentium

16 2x Pentium

16 Alpha

Firewall

Intel 8x SMP

Intel 8x SMP

SGI Origin

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

"The Net"

Snidely attacks: blocked atfirewall

Dudley does nothing.

Page 9: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

The AttacksThe Attacks

16 2x Pentium

16 2x Pentium

16 2x Pentium

16 Alpha

Firewall

Intel 8x SMP

Intel 8x SMP

SGI Origin

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

"The Net"

Natasha attacks Rocky; caught by IDS.

Page 10: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

The AttacksThe Attacks

16 2x Pentium

16 2x Pentium

16 2x Pentium

16 Alpha

Firewall

Intel 8x SMP

Intel 8x SMP

SGI Origin

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

"The Net"

Rocky’s job migrates back into safe zone;Dudley must give up resources.

Page 11: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

The AttacksThe Attacks

16 2x Pentium

16 2x Pentium

16 2x Pentium

16 Alpha

Firewall

Intel 8x SMP

Intel 8x SMP

SGI Origin

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

"The Net"

Boris attacks Bullwinkle’s job.Some attacks succeed.

Page 12: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

The AttacksThe Attacks

16 2x Pentium

16 2x Pentium

16 2x Pentium

16 Alpha

Firewall

Intel 8x SMP

Intel 8x SMP

SGI Origin

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

3Com Superstack 3300

"The Net"

Bullwinkle’s job employs camouflage,decoys, and migration.

Page 13: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Groups and ReplicationGroups and Replication

Group

Processor

One group per One group per computational computational tasktask

User selects User selects replication level, replication level, other policiesother policies

Group mapped Group mapped across processorsacross processors

Periodic liveness Periodic liveness checkschecks

Page 14: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

Theory Framework: Theory Framework: GoalsGoals Understand the interplay among Understand the interplay among

core aspects of CRLibcore aspects of CRLib Groups, locations, resources, Groups, locations, resources,

schedules, …schedules, … Reason about effects of Reason about effects of

configuration and policy choicesconfiguration and policy choices Reason about applications’ Reason about applications’

conformance to desired behaviorconformance to desired behavior

Page 15: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

Framework BasicsFramework Basics Build on existing mobile calculi Build on existing mobile calculi

-Calculus, Mobile Ambients, Join--Calculus, Mobile Ambients, Join-CalculusCalculus

Capture essential features of CRLibCapture essential features of CRLib ReplicationReplication MigrationMigration ReconfigurationReconfiguration CamouflageCamouflage

Page 16: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

A A -Calculus Primer-Calculus Primer Collection of Collection of namesnames

Represent information: vRepresent information: values, alues, communication links (channels), codecommunication links (channels), code

Have scopeHave scope Message-based communicationMessage-based communication

receipt of a value on xreceipt of a value on xtransmission of y along xtransmission of y along x

Information mobility: information Information mobility: information can be passed beyond original can be passed beyond original scopescope

yx

yx

Page 17: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

Finding a Service Finding a Service ProviderProvider

Client wants to find a service Client wants to find a service provider:provider:

1.1. Query the Service Directory, include Query the Service Directory, include a SASE. a SASE.

2.2. Wait for response.Wait for response.

3.3. Upon receipt, submit request.Upon receipt, submit request.

0... reqspspaddraddrquery

Page 18: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

Handling Service Handling Service RequestsRequests Service Directory repeatedly responds Service Directory repeatedly responds

to queries, arbitrarily choosing provider.to queries, arbitrarily choosing provider.

Service providers wait for requests.Service providers wait for requests.

crabraararaquery .!

jobDOjobb .! jobDOjoba .!

jobDOjobc .!

Page 19: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

crabraararaquery .!

jobDOjobb .! jobDOjoba .!

jobDOjobc .!

bbccaa

0... reqspspaddraddrquery

queryquery

Page 20: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

0.. reqspspaddr

crabraararaquery .!

jobDOjobb .! jobDOjoba .!

jobDOjobc .!

bbccaa

caddrbaddraaddr

addraddr

a b ca b c

Page 21: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

0.reqb

crabraararaquery .!

jobDOjobb .! jobDOjoba .!

jobDOjobc .!

bbccaa

0

bb

Page 22: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

crabraararaquery .!

jobDOjobb .! jobDOjoba .!

jobDOjobc .!

bbccaa

reqDO

0

Page 23: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

Initial QuestionsInitial Questions What are the primary entities, as What are the primary entities, as

well as the relationships among well as the relationships among them?them? Groups, locations, failuresGroups, locations, failures External events: DEFCON changesExternal events: DEFCON changes Scheduling policiesScheduling policies Application policies Application policies

What is the most appropriate way What is the most appropriate way to integrate those components?to integrate those components? And at what abstraction level?And at what abstraction level?

Page 24: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

In Progress: Two Calculi In Progress: Two Calculi Higher-level calculus that Higher-level calculus that

incorporates the CRLib APIincorporates the CRLib API Captures groups, policies, etc.Captures groups, policies, etc.

Lower-level calculus that provides Lower-level calculus that provides semantics for higher-level calculussemantics for higher-level calculus Captures abstract implementation Captures abstract implementation

details. details.

Soundness of the translation will Soundness of the translation will provide validation. provide validation.

Page 25: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

A Thought ExperimentA Thought ExperimentSuppose there are two tasks, A and Suppose there are two tasks, A and

B, working in parallel:B, working in parallel: A’s replication level: 4A’s replication level: 4 B’s replication level: 2B’s replication level: 2 Three processors: P1 P2 P3Three processors: P1 P2 P3

Resulting behavior (modulo Resulting behavior (modulo robustness) should be similar to robustness) should be similar to system with single copies of A and system with single copies of A and B.B.

Page 26: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

Open QuestionsOpen Questions How do we define “similar”, much How do we define “similar”, much

less prove it?less prove it? CorrectnessCorrectness PerformancePerformance RobustnessRobustness

What are sufficiently high-level yet What are sufficiently high-level yet informative performance informative performance measures?measures? How to model camouflage?How to model camouflage?

Page 27: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

Back to CRLib: StatusBack to CRLib: Status Multiple platformsMultiple platforms

Windows NT/2000, Linux, SGI IRIX, Windows NT/2000, Linux, SGI IRIX, SolarisSolaris

Heterogeneous resource Heterogeneous resource management methodsmanagement methods Load-balancing across heterogeneous Load-balancing across heterogeneous

networksnetworks Performance improvement by factor of 3Performance improvement by factor of 3

Demo this eveningDemo this evening

Page 28: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

In ProgressIn Progress Adding support for Byzantine Adding support for Byzantine

failuresfailures User-level option for authenticated User-level option for authenticated

messagesmessages Based on Lamport-Shostak-Pease Based on Lamport-Shostak-Pease

algorithmsalgorithms Greater resiliency needed for Greater resiliency needed for

nonauthenticated messagesnonauthenticated messages Evaluating cost of replicationEvaluating cost of replication

Compare to standard checkpointingCompare to standard checkpointing

Page 29: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

Next Steps for ProjectNext Steps for Project Tool for user policy expressionTool for user policy expression

Choices for replication/recovery methods, Choices for replication/recovery methods, agreement protocols, message-passing agreement protocols, message-passing schemes schemes

State-dependent policy specified via “chinese State-dependent policy specified via “chinese menu” approachmenu” approach

Scheduling frameworkScheduling framework Schedulers that understand CR policies, Schedulers that understand CR policies,

resulting resource demands, user/process resulting resource demands, user/process priorities priorities

Build on previous MESSIAHS and Legion workBuild on previous MESSIAHS and Legion work Finalize core CR calculi; turn to analysis Finalize core CR calculi; turn to analysis

techniquestechniques

Page 30: Computational Resiliency Steve J. Chapin, Susan Older Center for Systems Assurance Syracuse University Gregg Irvin Mobium Enterprises 24 July 2001Not for

Computational Resiliency – CSAComputational Resiliency – CSA Not for Public ReleaseNot for Public Release

Open IssuesOpen Issues Cost/benefit analysis of CRCost/benefit analysis of CR

How much protection do we provide if How much protection do we provide if the attacker knows what we’re trying the attacker knows what we’re trying to do?to do?

How much is performance affected by How much is performance affected by message load, active replication, message load, active replication, etc. ?etc. ?

Potential integration with other Potential integration with other OASIS projectsOASIS projects