31
Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems Autonomic Systems n Autonomic Autonomic : adaptive : adaptive l Self Self- healing healing : n cluster systems via node restart cluster systems via node restart l Self Self- optimizing: optimizing: n variable encoding schemes for web audio streaming variable encoding schemes for web audio streaming services services l Self Self- regulating regulating : n apache web server periodically kills child processes apache web server periodically kills child processes n Maintenance Maintenance : l expensive expensive , time , time- consuming consuming I want my availability, but I won’t do it myself I want my availability, but I won’t do it myself n Automated maintenance Automated maintenance : l Cheaper Cheaper l Quicker response than human Quicker response than human l 24/7 watch, can afford to “forget and leave running” 24/7 watch, can afford to “forget and leave running”

n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 11

Autonomic SystemsAutonomic Systemsnn AutonomicAutonomic: adaptive : adaptive

ll SelfSelf--healinghealing::nn cluster systems via node restartcluster systems via node restart

ll SelfSelf--optimizing:optimizing:nn variable encoding schemes for web audio streaming variable encoding schemes for web audio streaming

servicesservicesll SelfSelf--regulatingregulating ::

nn apache web server periodically kills child processesapache web server periodically kills child processes

nn MaintenanceMaintenance::ll expensiveexpensive, time, time--consuming consuming

I want my availability, but I won’t do it myselfI want my availability, but I won’t do it myself

nn Automated maintenanceAutomated maintenance::ll CheaperCheaperll Quicker response than humanQuicker response than humanll 24/7 watch, can afford to “forget and leave running”24/7 watch, can afford to “forget and leave running”

Page 2: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 22

Items for discussionItems for discussionnn Can largeCan large--scale, distributed applications be selfscale, distributed applications be self--

healing, selfhealing, self--regulating, selfregulating, self--optimizing?optimizing?

nn Important issues with respect to automated Important issues with respect to automated maintenance of largemaintenance of large--scale, software systemsscale, software systemsll Harder to build. Focus on reusable componentsHarder to build. Focus on reusable componentsll Specify maintenance operations during developmentSpecify maintenance operations during developmentll Considering maintenance as runtime adaptationsConsidering maintenance as runtime adaptationsll Gracefully handle unfamiliar, exceptional conditionsGracefully handle unfamiliar, exceptional conditions

nn Proposal: design methodologyProposal: design methodologyll Separation of concerns: Separation of concerns:

nn Application code vs. Application code vs. adaptation mechanisms {decision logic, implementation}adaptation mechanisms {decision logic, implementation}

ll Introspection:Introspection:nn Communicate runtime data to decision logicCommunicate runtime data to decision logic

ll Intercession:Intercession:nn Transport reconfiguration code from decision logicTransport reconfiguration code from decision logic

Page 3: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 33

Build largeBuild large--scale systems scale systems with reusable componentswith reusable componentsnn Inherent problem with the development of largeInherent problem with the development of large--

scale systemsscale systemsll Hugely complex, unwise for one group of developers Hugely complex, unwise for one group of developers

to create the whole thing from scratchto create the whole thing from scratchll Outsource subOutsource sub--projects to experts projects to experts vs.vs. license their license their

technologytechnologyll Integrate with COTS components:Integrate with COTS components:

nn Cheaper than to reCheaper than to re--implement themimplement them

nn Software engineering and practicality reasonsSoftware engineering and practicality reasonsll component has already been implementedcomponent has already been implementedll available immediatelyavailable immediatelyll no duplication of effortno duplication of effortll 3 types of software components:3 types of software components:

nn COTSCOTSnn InIn--househousenn OneOne--use, specificuse, specific--purpose componentpurpose component

Page 4: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 44

ComponentComponent--based based Software EngineeringSoftware Engineeringnn Software componentSoftware component: :

ll unit of software that conforms to a component modelunit of software that conforms to a component modelnn e.g. COM+, JavaBeanse.g. COM+, JavaBeans

ll Defines standards:Defines standards:nn Composition: how components are composed togetherComposition: how components are composed togethernn Interaction: IDL description of interface elementsInteraction: IDL description of interface elements

nn Two stages of CBSETwo stages of CBSE1.1. Component developmentComponent development

ll No feedback from customerNo feedback from customerll No waterfall model with iterationsNo waterfall model with iterationsll Exhibit openness, adaptability, Exhibit openness, adaptability,

2.2. Integrating component into applicationsIntegrating component into applicationsll Requirements analysisRequirements analysisll Choose component with required functionalityChoose component with required functionality

Take it or leave it ... Take it or leave it ... but then go on looking for another implementationbut then go on looking for another implementation

Page 5: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 55

ComponentComponent--based based Software Engineering Software Engineering –– iiiinn Imperfect match in functionality and requirementsImperfect match in functionality and requirements

ll “Fixed” contract“Fixed” contractnn No means for component evolutionNo means for component evolution

ll Active Interfaces [12]Active Interfaces [12]nn Adaptation interface. Open policiesAdaptation interface. Open policiesnn Static adaptation of component functionalityStatic adaptation of component functionality

ll Interface IncompatibilitiesInterface Incompatibilitiesnn Granularity of operations and dataGranularity of operations and data--types, interaction types, interaction

mechanisms, implementation languagesmechanisms, implementation languagesnn Component wrappersComponent wrappersnn Connectors [14]Connectors [14]nn SWIG, JNI, SWIG, JNI, popen(..)popen(..), , system(..)system(..)

nn ConsiderationsConsiderationsll Application builder is Application builder is notnot going to regoing to re--implement the implement the

componentcomponentll Want to maintain encapsulation, information hidingWant to maintain encapsulation, information hiding

Page 6: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 66

Items for discussionItems for discussionnn Can largeCan large--scale, distributed applications be selfscale, distributed applications be self--

healing, selfhealing, self--regulating, selfregulating, self--optimizing?optimizing?

nn Important issues with respect to automated Important issues with respect to automated maintenance of largemaintenance of large--scale, software systemsscale, software systemsll Harder to build. Focus on reusable componentsHarder to build. Focus on reusable componentsll Specify maintenance operations during developmentSpecify maintenance operations during developmentll Considering maintenance as runtime adaptationsConsidering maintenance as runtime adaptationsll Gracefully handle unfamiliar, exceptional conditionsGracefully handle unfamiliar, exceptional conditions

nn Proposal: design methodologyProposal: design methodologyll Separation of concerns: Separation of concerns:

nn Application code vs. Application code vs. adaptation mechanisms {decision logic, implementation}adaptation mechanisms {decision logic, implementation}

ll Introspection:Introspection:nn Communicate runtime data to decision logicCommunicate runtime data to decision logic

ll Intercession:Intercession:nn Transport reconfiguration code from decision logicTransport reconfiguration code from decision logic

Page 7: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 77

Static modeling of possible Static modeling of possible runtime reconfigurationsruntime reconfigurationsnn Runtime adaptation of softwareRuntime adaptation of software

ll EverEver--changing resource availabilitychanging resource availabilityll Dynamic execution environmentDynamic execution environment

nn Separation of concernsSeparation of concerns::ll application logic vs. adaptationapplication logic vs. adaptation

nn Granularity of adaptationGranularity of adaptationll MicroMicro--level: level:

nn component developercomponent developer--enabled mechanism, setting enabled mechanism, setting switches via Active Interfaces [12, 13, 16]switches via Active Interfaces [12, 13, 16]

ll MediumMedium--level: level: nn change how components interact with the system, change how components interact with the system,

modify the interface [13, 14]modify the interface [13, 14]ll MacroMacro--level:level:

nn phase in/out (groups of) components as part of the phase in/out (groups of) components as part of the dynamic adaptation [13, 14]dynamic adaptation [13, 14]

Page 8: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 88

Static modeling of possible Static modeling of possible runtime reconfigurations runtime reconfigurations –– iiiinn SelfSelf--contained adaptation within componentcontained adaptation within component

ll Automatic generation of adaptation codeAutomatic generation of adaptation codenn Compiler and language support for highCompiler and language support for high--level level

specification of adaptation mechanism [13]specification of adaptation mechanism [13]ll PrePre--packaged adaptation mechanism [16]packaged adaptation mechanism [16]

nn Automatic integration of new component Automatic integration of new component versionsversionsll Configuration management [15]Configuration management [15]

nn Installations, updates, unInstallations, updates, un--installationsinstallationsll Tentative use of new versions [14]Tentative use of new versions [14]

nn Transparent testing in deployed environmentTransparent testing in deployed environment

Page 9: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 99

Items for discussionItems for discussionnn Can largeCan large--scale, distributed applications be selfscale, distributed applications be self--

healing, selfhealing, self--regulating, selfregulating, self--optimizing?optimizing?

nn Important issues with respect to automated Important issues with respect to automated maintenance of largemaintenance of large--scale, software systemsscale, software systemsll Harder to build. Focus on reusable componentsHarder to build. Focus on reusable componentsll Specify maintenance operations during developmentSpecify maintenance operations during developmentll Considering maintenance as runtime adaptationsConsidering maintenance as runtime adaptationsll Gracefully handle unfamiliar, exceptional conditionsGracefully handle unfamiliar, exceptional conditions

nn Proposal: design methodologyProposal: design methodologyll Separation of concerns: Separation of concerns:

nn Application code vs. Application code vs. adaptation mechanisms {decision logic, implementation}adaptation mechanisms {decision logic, implementation}

ll Introspection:Introspection:nn Communicate runtime data to decision logicCommunicate runtime data to decision logic

ll Intercession:Intercession:nn Transport reconfiguration code from decision logicTransport reconfiguration code from decision logic

Page 10: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1010

Writing code toWriting code toimplement dynamic adaptationsimplement dynamic adaptationsnn Hard to dynamically adapt componentsHard to dynamically adapt components

ll Lack proper understanding of the internalsLack proper understanding of the internalsll Execute (un) trusted, unfamiliar code, with no idea Execute (un) trusted, unfamiliar code, with no idea

how to fix if things failhow to fix if things fail

nn RecognizeRecognize the need to adaptthe need to adapt

nn Utilize the available runtime mechanismsUtilize the available runtime mechanismsll PrePre--existing reconfiguration mechanismsexisting reconfiguration mechanisms

nn Dispatch directives to carry out local microDispatch directives to carry out local micro--adaptationsadaptations

ll Use adaptability of middleware to effectively carry out Use adaptability of middleware to effectively carry out mediummedium-- and macroand macro--scale adaptationsscale adaptations

ll Architectural designArchitectural design--driven adapted, guided by driven adapted, guided by componentcomponent--interaction specificationsinteraction specifications

The inability to reconfigure when required, is a form of failureThe inability to reconfigure when required, is a form of failure

Page 11: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1111

Items for discussionItems for discussionnn Can largeCan large--scale, distributed applications be selfscale, distributed applications be self--

healing, selfhealing, self--regulating, selfregulating, self--optimizing?optimizing?

nn Important issues with respect to automated Important issues with respect to automated maintenance of largemaintenance of large--scale, software systemsscale, software systemsll Harder to build. Focus on reusable componentsHarder to build. Focus on reusable componentsll Specify maintenance operations during developmentSpecify maintenance operations during developmentll Considering maintenance as runtime adaptationsConsidering maintenance as runtime adaptationsll Gracefully handle unfamiliar, exceptional conditionsGracefully handle unfamiliar, exceptional conditions

nn Proposal: design methodologyProposal: design methodologyll Separation of concerns: Separation of concerns:

nn Application code vs. Application code vs. adaptation mechanisms {decision logic, implementation}adaptation mechanisms {decision logic, implementation}

ll Introspection:Introspection:nn Communicate runtime data to decision logicCommunicate runtime data to decision logic

ll Intercession:Intercession:nn Transport reconfiguration code from decision logicTransport reconfiguration code from decision logic

Page 12: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1212

SelfSelf--healing systemshealing systemsnn Failure is inevitableFailure is inevitable: [20]: [20]

ll human error: human error: nn stress level proportional to probability of making a stress level proportional to probability of making a

mistake [22]mistake [22]nn can shield from user error, systems lack protection from can shield from user error, systems lack protection from

administrator's errors [22]administrator's errors [22]

ll unanticipated problem:unanticipated problem:nn beyond careful and thorough testingbeyond careful and thorough testingnn directed security attackdirected security attacknn lack of handling mechanismlack of handling mechanism

ll software aging: transient bugssoftware aging: transient bugsnn recovery requires a restartrecovery requires a restartnn buildbuild--up of transient bugsup of transient bugsnn failurefailure--prone state during executionprone state during execution

Page 13: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1313

SelfSelf--healing systems healing systems –– iiiinn Availability of system Availability of system

ll Highly resilientHighly resilientnn Programmed to handle every expected problemProgrammed to handle every expected problemnn SelfSelf--heals: manages to survive heals: manages to survive unexpectedunexpected situationssituations

ll Availability ratio: MTTF / (MTTF+MTTR)Availability ratio: MTTF / (MTTF+MTTR)nn increaseincrease base longevity period (BLP)base longevity period (BLP)nn decreasedecrease recovery timerecovery time

nn ProblemProblem--handling mechanismhandling mechanism ::ll reactive, failurereactive, failure--driven:driven:

nn detect occurred failure, follow with restart of affected detect occurred failure, follow with restart of affected subsystems from a stable statesubsystems from a stable state

ll preventive/proactive, failurepreventive/proactive, failure--avoidance: avoidance: nn detect increased likelihood of failure, and gradual detect increased likelihood of failure, and gradual

degradation of performance, avert imminent failuredegradation of performance, avert imminent failure

Page 14: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1414

Technique:Technique:Software Rejuvenation [18, 19]Software Rejuvenation [18, 19]nn Graceful termination, Immediate restartGraceful termination, Immediate restart

ll Restart at a clean, internal stateRestart at a clean, internal statell BuildBuild--up of transient bugsup of transient bugsll Numerical accumulation errors, unreleased system Numerical accumulation errors, unreleased system

resources, memory leak, data corruptionresources, memory leak, data corruption

nn Levels of rejuvenationLevels of rejuvenationll Total rejuvenation Total rejuvenation

nn Scheduled downtime can be fairly cheapScheduled downtime can be fairly cheapnn Minimal interruption during low usage periodsMinimal interruption during low usage periods

ll Partial rejuvenationPartial rejuvenationnn Transparently rejuvenate selected subcomponentsTransparently rejuvenate selected subcomponentsnn Decoupling between subcomponentsDecoupling between subcomponentsnn Reduced recovery time only for subsystem restartReduced recovery time only for subsystem restart

ll Recursive rejuvenation [21]Recursive rejuvenation [21]nn Rejuvenate progressively larger subsystems recursivelyRejuvenate progressively larger subsystems recursivelynn Functional or data dependencies between Functional or data dependencies between

subcomponentssubcomponents

Page 15: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1515

nn Program checkProgram check--pointingpointingll Periodically save program state to persistent storagePeriodically save program state to persistent storagell Can rewind to previous statesCan rewind to previous states

nn auditing, logsauditing, logsnn recovery to a valid staterecovery to a valid statenn install corrective patch, resume [22]install corrective patch, resume [22]

ll The power of hindsight to enable retroactive repairThe power of hindsight to enable retroactive repairll Demonstrates “Demonstrates “what ifwhat if” semantics” semanticsll Database systems: Database systems:

nn rollback to consistent state if cannot commit safelyrollback to consistent state if cannot commit safely

nn ZeroZero--tolerance of system compromisetolerance of system compromisell PrePre--emptive defense against security attacksemptive defense against security attacks

nn Randomized, but valid binary code sequenceRandomized, but valid binary code sequencenn Sanity checking of control structuresSanity checking of control structuresnn Choose immediate shutdown rather than have system Choose immediate shutdown rather than have system

get compromisedget compromisedll Immediate restart, with new randomized codeImmediate restart, with new randomized code

Other selfOther self--healing techniqueshealing techniques

Page 16: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1616

Items for discussionItems for discussionnn Can largeCan large--scale, distributed applications be selfscale, distributed applications be self--

healing, selfhealing, self--regulating, selfregulating, self--optimizing?optimizing?

nn Important issues with respect to automated Important issues with respect to automated maintenance of largemaintenance of large--scale, software systemsscale, software systemsll Harder to build. Focus on reusable componentsHarder to build. Focus on reusable componentsll Specify maintenance operations during developmentSpecify maintenance operations during developmentll Considering maintenance as runtime adaptationsConsidering maintenance as runtime adaptationsll Gracefully handle unfamiliar, exceptional conditionsGracefully handle unfamiliar, exceptional conditions

nn Proposal: design methodologyProposal: design methodologyll Separation of concerns: Separation of concerns:

nn Application code vs. Application code vs. adaptation mechanisms {decision logic, implementation}adaptation mechanisms {decision logic, implementation}

ll Introspection:Introspection:nn Communicate runtime data to decision logicCommunicate runtime data to decision logic

ll Intercession:Intercession:nn Transport reconfiguration code from decision logicTransport reconfiguration code from decision logic

Page 17: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1717

Dynamic profiling, Dynamic profiling, generation of runtime datageneration of runtime datann Adaptation subsystemAdaptation subsystem ::

ll Monitoring logic and decisionMonitoring logic and decision--makingmakingll Execution of adaptation mechanismExecution of adaptation mechanism

nn Automated decision and implementationAutomated decision and implementationll Adaptation for recovery or otherwise, without human Adaptation for recovery or otherwise, without human

interventionintervention

nn Runtime model of the system architectureRuntime model of the system architecturell Decision based on evolving modelDecision based on evolving modelll Runtime data generated by each componentRuntime data generated by each component

nn Embedded probes: PSLEmbedded probes: PSLnn StaticStatic--adaptable Active Interfaces [12]adaptable Active Interfaces [12]

ll ContextContext--dependent data format and contentdependent data format and contentnn EE--mail management system: size, frequency, mail management system: size, frequency,

sender/recipient addresses, types of attachments, sender/recipient addresses, types of attachments, encryption strengthencryption strength

Page 18: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1818

Communication of Communication of runtime data to decision logicruntime data to decision logicnn Extended RPCExtended RPC--style communicationstyle communication

ll Client communicates with server at unknown locationClient communicates with server at unknown locationll RPC clients (execution logic) should be unaware of RPC clients (execution logic) should be unaware of

the presence of RPC servers (decision logic)the presence of RPC servers (decision logic)ll Need to multiplex emitted dataNeed to multiplex emitted datall Asynchronous callbackAsynchronous callback

I can't wait, let me know when you're done!I can't wait, let me know when you're done!ll Basic Basic Message Passing Message Passing to unknown recipientsto unknown recipients

nn Event notification systemEvent notification systemSubscribe to published eventsSubscribe to published events--ofof--interestinterest

ll Item of interestItem of interestnn Something that happened somewhere, runtime dataSomething that happened somewhere, runtime data

ll Generators of items of interestGenerators of items of interestnn Core system execution, reporting runtime dataCore system execution, reporting runtime data

ll Consumers of items of interestConsumers of items of interestnn Monitoring subsystem, interested in runtime dataMonitoring subsystem, interested in runtime data

Page 19: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1919

Event systemsEvent systemsnn Centralized event systemsCentralized event systems

ll eventevent--driven GUI programmingdriven GUI programmingll Event Delegation Model: AWT, SWING, JavaBeansEvent Delegation Model: AWT, SWING, JavaBeans

nn TightlyTightly--coupled clientcoupled client--server model: JINIserver model: JINInn Indirection, anonymity of servers via mediator objectIndirection, anonymity of servers via mediator object

ll Stable execution environmentStable execution environmentnn WellWell--ordered delivery mechanismsordered delivery mechanismsnn Fast, reliable, predictableFast, reliable, predictable

nn Distributed event systemsDistributed event systemsll Supercharged mediator between decoupled entitiesSupercharged mediator between decoupled entities

nn FilteringFilteringnn AggregatingAggregatingnn StoreStore--andand--forward, Storeforward, Store--andand--retrieveretrievenn Mutual anonymityMutual anonymity

ll Unreliable execution environmentUnreliable execution environmentnn Delayed deliveryDelayed deliverynn Data lossData loss

Page 20: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 2020

nn ChannelChannel--based routingbased routing::ll Single channel per event type [9]Single channel per event type [9]

birds of a feather flock togetherbirds of a feather flock togetherll faster turnaround time; simple, efficient deliveryfaster turnaround time; simple, efficient deliveryll not scalable to large classes of eventsnot scalable to large classes of events

nn SubjectSubject--based routingbased routing: : ll NNTP: events on a common theme / interestNNTP: events on a common theme / interestll Mailing lists, CVS notificationsMailing lists, CVS notifications

nn ContentContent--based (semantic) routingbased (semantic) routing::ll Interested in a subset of a class of eventsInterested in a subset of a class of eventsll selective delivery via specifying acceptability criteriaselective delivery via specifying acceptability criteriall EventEvent--data determines propagationdata determines propagationll Data replication only if necessary [10, 11]Data replication only if necessary [10, 11]ll Event composition [8]Event composition [8]

Distributed event systemsDistributed event systems

Page 21: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 2121

nn Centralized routing nodeCentralized routing nodell Approximation of localized event systemApproximation of localized event system

nn Hierarchical collection of nodesHierarchical collection of nodesll Subscriptions only go up, notifications cascade downSubscriptions only go up, notifications cascade downll DisadvantagesDisadvantages

nn Overloading of higherOverloading of higher--level routing nodeslevel routing nodesnn Network partitioning via single node failureNetwork partitioning via single node failure

ll AdvantagesAdvantagesnn Simple routing algorithmsSimple routing algorithmsnn Simple clientSimple client--server relationships amongst routing server relationships amongst routing

nodesnodes

nn (A)cyclic peer(A)cyclic peer--toto--peer networkpeer networkll Sophisticated routing algorithmsSophisticated routing algorithmsll Improved faultImproved fault--tolerancetolerance

ContentContent--basedbasedevent routing topologiesevent routing topologies

Page 22: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 2222

Items for discussionItems for discussionnn Can largeCan large--scale, distributed applications be selfscale, distributed applications be self--

healing, selfhealing, self--regulating, selfregulating, self--optimizing?optimizing?

nn Important issues with respect to automated Important issues with respect to automated maintenance of largemaintenance of large--scale, software systemsscale, software systemsll Harder to build. Focus on reusable componentsHarder to build. Focus on reusable componentsll Specify maintenance operations during developmentSpecify maintenance operations during developmentll Considering maintenance as runtime adaptationsConsidering maintenance as runtime adaptationsll Gracefully handle unfamiliar, exceptional conditionsGracefully handle unfamiliar, exceptional conditions

nn Proposal: design methodologyProposal: design methodologyll Separation of concerns: Separation of concerns:

nn Application code vs. Application code vs. adaptation mechanisms {decision logic, implementation}adaptation mechanisms {decision logic, implementation}

ll Introspection:Introspection:nn Communicate runtime data to decision logicCommunicate runtime data to decision logic

ll Intercession:Intercession:nn Transport reconfiguration code from decision logicTransport reconfiguration code from decision logic

Page 23: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 2323

Activation of reconfiguration codeActivation of reconfiguration codenn ReRe--use eventsuse events

ll the source (client/decision logic) determines who gets the source (client/decision logic) determines who gets reconfigured, so cannot have the server (execution reconfigured, so cannot have the server (execution logic) subscribe to theselogic) subscribe to these

ll event systems not designed to carry large amount of event systems not designed to carry large amount of binary code, if needed for component installation, etcbinary code, if needed for component installation, etc

nn Mobile agents [5]Mobile agents [5]autonomous program that executes on someone’s behalfautonomous program that executes on someone’s behalf

ll decision logic instructs agents to carry out runtime decision logic instructs agents to carry out runtime reconfiguration tasksreconfiguration tasksnn LateLate--binding of reconfiguration mechanism at targetbinding of reconfiguration mechanism at targetnn AsynchronousAsynchronousnn primary advantage of agents: reconfiguration might primary advantage of agents: reconfiguration might

consist of significant amount of computing, ideally consist of significant amount of computing, ideally performed locally at execution logic rather than a long performed locally at execution logic rather than a long series of RPC invocationsseries of RPC invocations

Page 24: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 2424

Mobile code infrastructuresMobile code infrastructuresnn ConstituentsConstituents

ll Server: hosting, execution, transportationServer: hosting, execution, transportationnn Place [6]Place [6]nn Agent Server [1, 3, 7]Agent Server [1, 3, 7]nn Worklet Virtual Machine: PSLWorklet Virtual Machine: PSL

ll AgentsAgents

nn Incorporate dynamic interfacesIncorporate dynamic interfacesll Agent installs specificAgent installs specific--purpose interfaces to purpose interfaces to

components for customized accesscomponents for customized accessll “Wrapper while you wait”, but can configure as “Wrapper while you wait”, but can configure as

neededneeded

Page 25: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 2525

Automatic mobility of programsAutomatic mobility of programsnn Strong mobilityStrong mobility

ll OS support for process relocation [5]OS support for process relocation [5]

nn Weak mobilityWeak mobilityll StateState-- and codeand code--transfer at application leveltransfer at application levelll ProgrammingProgramming--language, runtime support [6]language, runtime support [6]

nn SpecialSpecial--purpose language [6]purpose language [6]nn Scripting languages [6]Scripting languages [6]

ll Agent code is in textual formAgent code is in textual form

nn General purpose language [23]General purpose language [23]ll LateLate--binding of class definitions by dynamic code loadingbinding of class definitions by dynamic code loadingll Serialization of objectsSerialization of objects

ll Simulated strong mobilitySimulated strong mobilitynn Local function continuations [2]Local function continuations [2]nn Modified JVM [4]Modified JVM [4]

Page 26: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 2626

Security issues: mobile codeSecurity issues: mobile codenn A greater vulnerability: unknown codeA greater vulnerability: unknown code

ll Protect agent from server, and vice versa [1, 3, 7]Protect agent from server, and vice versa [1, 3, 7]

nn Language supportLanguage supportll Bytecode verification in JVM Bytecode verification in JVM

nn TypeType--system protection from malicious classessystem protection from malicious classesnn IntegrityIntegrity--checking of bytecode instructionschecking of bytecode instructions

ll Cannot define / load core system classesCannot define / load core system classes

nn ApplicationApplication--level security considerationslevel security considerations::ll Authentication, authorizationAuthentication, authorization

nn Permissions model based on certification, credentialsPermissions model based on certification, credentialsll Data encryption during transitData encryption during transitll Tampering detection via digital signaturesTampering detection via digital signatures

Page 27: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 2727

Conclusions, future directionsConclusions, future directionsnn Autonomic largeAutonomic large--scale, distributed systemsscale, distributed systems

ll Criteria for construction and automated maintenanceCriteria for construction and automated maintenancell State of the art researchState of the art research

nn Autonomic systems exist for specific domainsAutonomic systems exist for specific domainsnn Technologies / tools available for building general Technologies / tools available for building general

framework for adaptationframework for adaptation

nn Dynamic architectural modelingDynamic architectural modelingll Accurate modeling of the system during executionAccurate modeling of the system during executionll Decision made on evolving modelDecision made on evolving modelll Adaptation heuristics based on:Adaptation heuristics based on:

nn Historical patternsHistorical patternsnn Temporal dataTemporal data

Page 28: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 2828

Bibliography Bibliography –– Mobile agentsMobile agents1.1. Design of the Ajanta System for Mobile Agent ProgrammingDesign of the Ajanta System for Mobile Agent Programming

AnandAnand R. R. TripathiTripathi, , NeeranNeeran M. M. KarnikKarnik, , TanvirTanvir Ahmed, Ram D. Singh, Ahmed, Ram D. Singh, ArvindArvind PrakashPrakash, , VineetVineet KakaniKakani, , Manish K. Manish K. VoraVora, , MuktaMukta PathakPathakJournal of Systems and Software, May 2002Journal of Systems and Software, May 2002

2.2. How to Migrate AgentsHow to Migrate AgentsMatthew Matthew HohlfeldHohlfeld, , BennetBennet YeeYeeTechnical Report CS98Technical Report CS98--588, Computer Science and Engineering Department, University of 588, Computer Science and Engineering Department, University of California at San Diego, La Jolla, CA, June 1998California at San Diego, La Jolla, CA, June 1998

3.3. Experiences and Future Challenges in Mobile Agent ProgrammingExperiences and Future Challenges in Mobile Agent ProgrammingAnandAnand R. R. TripathiTripathi, , TanvirTanvir Ahmed, Ahmed, NeeranNeeran M. M. KarnikKarnikMicroprocessor and Microsystems 2001Microprocessor and Microsystems 2001

4.4. Pickling threads state in the Java systemPickling threads state in the Java systemS. Bouchenak, D. S. Bouchenak, D. HagimontHagimontIn Proc. of the Technology of ObjectIn Proc. of the Technology of Object--Oriented Languages and Systems (TOOLS), 2000Oriented Languages and Systems (TOOLS), 2000

5.5. Mobile Agents: Are they a good idea?Mobile Agents: Are they a good idea?Colin G. Harrison, David M. Chess, Aaron Colin G. Harrison, David M. Chess, Aaron KershenbaumKershenbaumIBM Research Report, IBM Research Report, T.J.WatsonT.J.Watson Research Center, NY, 1995Research Center, NY, 1995

6.6. Programming languages for mobile codeProgramming languages for mobile codeTommy ThornTommy ThornACM Computing Surveys, 29(3):213ACM Computing Surveys, 29(3):213--239, 1997. Also Technical Report 1083, University of 239, 1997. Also Technical Report 1083, University of RennesRennesIRISAIRISA

7.7. Design Issues in Mobile Agent Programming SystemsDesign Issues in Mobile Agent Programming SystemsNeeranNeeran M. M. KarnikKarnik, , AnandAnand R. R. TripathiTripathiIEEE Concurrency, JulyIEEE Concurrency, July--Sep 1998Sep 1998

Page 29: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 2929

Bibliography Bibliography –– Event systemsEvent systems8.8. Generic Support for Distributed ApplicationsGeneric Support for Distributed Applications

Jean Bacon, Ken Moody, John Bates, Richard Jean Bacon, Ken Moody, John Bates, Richard HaytonHayton, , ChaoyingChaoying Ma, Andrew McNeil, Oliver Seidel, Ma, Andrew McNeil, Oliver Seidel, Mark Mark SpiteriSpiteriIEEE Computer, pages 68IEEE Computer, pages 68--77, March 200077, March 2000

9.9. Host Groups: A Multicast Extension to the Internet ProtocolHost Groups: A Multicast Extension to the Internet ProtocolS. E. S. E. DeeringDeering, D. R. , D. R. CheritonCheritonNetwork Working Group: RFC 0966Network Working Group: RFC 0966

10.10. State of the Art Review of Distributed Event ModelsState of the Art Review of Distributed Event ModelsRené MeierRené MeierDept. of Computer Science, Trinity College Dublin, Ireland, MarcDept. of Computer Science, Trinity College Dublin, Ireland, March 2000. Technical report TCDh 2000. Technical report TCD--CSCS--20002000--1616

11.11. Achieving Expressiveness and Scalability in an InternetAchieving Expressiveness and Scalability in an Internet--Scale Event Notification ServiceScale Event Notification ServiceAntonio Antonio CarzanigaCarzaniga, David S. , David S. RosenblumRosenblum, Alexander L. Wolf, Alexander L. WolfIn Proceedings of the Nineteenth ACM Symposium on Principles of In Proceedings of the Nineteenth ACM Symposium on Principles of Distributed Computing (PODC Distributed Computing (PODC 2000)2000)

Page 30: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 3030

Bibliography Bibliography –– System adaptationSystem adaptation12.12. A Model for Designing Adaptable Software ComponentsA Model for Designing Adaptable Software Components

George HeinemanGeorge HeinemanIn 22nd Annual International Computer Software and Applications In 22nd Annual International Computer Software and Applications Conference, pages 121Conference, pages 121----127, 127, Vienna, Austria, August 1998. In 22nd Annual International CompuVienna, Austria, August 1998. In 22nd Annual International Computer Software and Applications ter Software and Applications Conference, pages 121Conference, pages 121----127, Vienna, Austria, August 1998127, Vienna, Austria, August 1998

13.13. Language and Compiler Support for Adaptive Distributed ApplicatiLanguage and Compiler Support for Adaptive Distributed ApplicationsonsVikramVikram Adve, Adve, VinhVinh Vi Lam, Brian Vi Lam, Brian EnsinkEnsinkACM SIGPLAN Workshop on Optimization of Middleware and DistributACM SIGPLAN Workshop on Optimization of Middleware and Distributed Systems (OM 2001) ed Systems (OM 2001) Snowbird, Utah, June 2001 (in conjunction with PLDI2001)Snowbird, Utah, June 2001 (in conjunction with PLDI2001)

14.14. Increasing the Confidence in OffIncreasing the Confidence in Off--thethe--Shelf Components: A Software ConnectorShelf Components: A Software Connector--Based Based ApproachApproachMarijaMarija RakicRakic, Nenad , Nenad MedvidovicMedvidovicProceedings of SSR '01 on 2001 Symposium on Software ReusabilityProceedings of SSR '01 on 2001 Symposium on Software Reusability : Putting Software Reuse in : Putting Software Reuse in ContextContext

15.15. A Cooperative Approach to Support Software Deployment Using the A Cooperative Approach to Support Software Deployment Using the Software DockSoftware DockRichard S. Hall, Dennis Richard S. Hall, Dennis HeimbignerHeimbigner, Alexander L. Wolf, Alexander L. WolfInternational Conference on Software International Conference on Software EngineringEnginering, May 1999, May 1999

16.16. The Illinois GRACE Project: Global Resource Adaptation through The Illinois GRACE Project: Global Resource Adaptation through CoopErationCoopErationSaritaSarita V. Adve, Albert F. Harris, Christopher J. Hughes, Douglas L. JoV. Adve, Albert F. Harris, Christopher J. Hughes, Douglas L. Jones, Robin H. nes, Robin H. KravetsKravets, , KlaraKlaraNahrstedtNahrstedt, Daniel , Daniel GrobeGrobe Sachs, Sachs, RuchiraRuchira SasankaSasanka, , JayanthJayanth SrinivisanSrinivisan, , WanghongWanghong YuanYuanIn proceedings of Workshop on SelfIn proceedings of Workshop on Self--Healing, Adaptive and selfHealing, Adaptive and self--MANagedMANaged Systems (SHAMAN) Systems (SHAMAN) 20022002

Page 31: n Autonomic: adaptive l Self-healing - Columbia …gskc/slides/candidacy.pdfAutonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Autonomic Systems n Autonomic: adaptive l

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 3131

Bibliography Bibliography ––Dynamic healing, MiscellaneousDynamic healing, Miscellaneous17.17. Autonomic ComputingAutonomic Computing

Paul Horn, IBM ResearchPaul Horn, IBM Research18.18. Software Software RejuventationRejuventation: Analysis, Module and Applications: Analysis, Module and Applications

YennunYennun Huang, Chandra Huang, Chandra KintalaKintala, Nick , Nick KolettisKolettis, N. Dudley Fulton, N. Dudley FultonProceedings of the 25th International Symposium on FaultProceedings of the 25th International Symposium on Fault--Tolerant Computing (FTCSTolerant Computing (FTCS--25), 25), Pasadena, CA, pp. June 1995, pp. 381Pasadena, CA, pp. June 1995, pp. 381--390390

19.19. IBM director software rejuvenationIBM director software rejuvenation..White paperWhite paper

20.20. Recovery Oriented Computing (ROC): Motivation, Definition, TechnRecovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studiesiques, and Case StudiesDavid Patterson, Aaron Brown, Pete David Patterson, Aaron Brown, Pete BroadwellBroadwell, George , George CandeaCandea, Mike Chen, James Cutler, Patricia , Mike Chen, James Cutler, Patricia Enriquez, Armando Fox, Enriquez, Armando Fox, EmreEmre KicimanKiciman, Matthew , Matthew MerzbacherMerzbacher, David Oppenheimer, , David Oppenheimer, NaveenNaveen SastrySastry, , William William TetzlaffTetzlaff, Jonathan , Jonathan TraupmannTraupmann, Noah , Noah TreuhaftTreuhaftUC Berkeley Computer Science Technical Report UCB//CSDUC Berkeley Computer Science Technical Report UCB//CSD--0202--1175, March 15, 20021175, March 15, 2002

21.21. Reducing Recovery Time in a Small Recursively Reducing Recovery Time in a Small Recursively RestartableRestartable SystemSystemGeorge George CandeaCandea, James Cutler, Armando Fox, , James Cutler, Armando Fox, RushabhRushabh DoshiDoshi, , PriyankPriyank GargGarg, , RakeshRakesh GowdaGowdaAppears in Proceedings of the International Conference on DependAppears in Proceedings of the International Conference on Dependable Systems and Networks able Systems and Networks (DSN(DSN--2002), June 20022002), June 2002

22.22. Rewind, Repair, Replay: Three R's to DependabilityRewind, Repair, Replay: Three R's to DependabilityAaron B. Brown, David A. PattersonAaron B. Brown, David A. PattersonTo appear in 10th ACM SIGOPS European Workshop, SaintTo appear in 10th ACM SIGOPS European Workshop, Saint--EmilionEmilion, France, September 2002, France, September 2002

23.23. Dynamic Class Loading in the Dynamic Class Loading in the Java(TMJava(TM) Virtual Machine) Virtual MachineShengSheng LiangLiang, , GiladGilad BrachaBrachaConference on ObjectConference on Object--oriented programming, systems, languages, and applications (OOPSoriented programming, systems, languages, and applications (OOPSLA'98)LA'98)