Upload
kelly-collins
View
36
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Notes On the GAE. Harvey B. Newman California Institute of Technology Grid-enabled Analysis Environment Workshop June 24, 2003. GAE Workshop Goals (1). “Getting Our Arms Around” the Grid-Enabled Analysis “Problem” - PowerPoint PPT Presentation
Citation preview
Notes On the GAENotes On the GAE
Harvey B. NewmanHarvey B. Newman California Institute of TechnologyCalifornia Institute of Technology
Grid-enabled Analysis Environment WorkshopGrid-enabled Analysis Environment WorkshopJune 24, 2003June 24, 2003
GAE Workshop Goals (1)GAE Workshop Goals (1)GAE Workshop Goals (1)GAE Workshop Goals (1) ““Getting Our Arms Around” the Grid-Enabled Getting Our Arms Around” the Grid-Enabled
Analysis “Problem” Analysis “Problem” Review Existing Work Towards a GAE:Review Existing Work Towards a GAE:
Components, Interfaces, System Concepts Components, Interfaces, System Concepts Review Client Analysis Tools; Consider How to Integrate ThemReview Client Analysis Tools; Consider How to Integrate Them User Interfaces: What does the GAE Desktop Look Like ?User Interfaces: What does the GAE Desktop Look Like ?
(Different Flavors) (Different Flavors) Look At Requirements, Ideas for a GAE Architecture Look At Requirements, Ideas for a GAE Architecture
A Vision of the System’s Goals and WorkingsA Vision of the System’s Goals and Workings Attention to Strategy and Policy Attention to Strategy and Policy
Develop (Continue) a Program of Simulations Develop (Continue) a Program of Simulations of the System of the System For the Computing Model, and Defining the GAEFor the Computing Model, and Defining the GAE Essential for Developing a Feasible Vision; DevelopingEssential for Developing a Feasible Vision; Developing
Strategies, Solving Problems and Optimizing the System Strategies, Solving Problems and Optimizing the System With a Complementary Program of PrototypingWith a Complementary Program of Prototyping
GAE Collaboration DesktopGAE Collaboration DesktopExampleExample
Four-screen Analysis Desktop Four-screen Analysis Desktop 4 Flat Panels: 5120 X 1024; RH94 Flat Panels: 5120 X 1024; RH9
Driven by a single server and Driven by a single server and single graphics cardsingle graphics card
Allows simultaneous work on:Allows simultaneous work on: Traditional analysis tools Traditional analysis tools
(e.g. ROOT)(e.g. ROOT) Software development Software development Event displays (e.g. IGUANA)Event displays (e.g. IGUANA) MonALISA monitoring MonALISA monitoring
displays; Other “Grid Views”displays; Other “Grid Views” Job-progress ViewsJob-progress Views Persistent collaboration Persistent collaboration
(e.g. VRVS; shared windows)(e.g. VRVS; shared windows) Online event or detector Online event or detector
monitoringmonitoring Web browsing, emailWeb browsing, email
GAE Workshop Goals (2)GAE Workshop Goals (2)GAE Workshop Goals (2)GAE Workshop Goals (2) Architectural Approaches: Choose A Feasible Direction Architectural Approaches: Choose A Feasible Direction
For example a For example a Managed Services ArchitectureManaged Services Architecture Be Prepared to Learn by Doing;Be Prepared to Learn by Doing;
Simulating and Prototyping Simulating and Prototyping Where to Start, and the Development StrategyWhere to Start, and the Development Strategy
Existing and MissingExisting and Missing Parts of the System Parts of the System [Layers; Concepts] [Layers; Concepts]
When to Adapt Existing Components, When to Adapt Existing Components, Or to Re-Build Them “from Scratch” Or to Re-Build Them “from Scratch”
Manpower Available to Meet the Goals; ShortfallsManpower Available to Meet the Goals; Shortfalls Allocation of Tasks; Including Generating a PlanAllocation of Tasks; Including Generating a Plan
Linkage Between Analysis and Grid-Enabled ProductionLinkage Between Analysis and Grid-Enabled Production Planning for Closer Relationship with LCG, Trillium, Planning for Closer Relationship with LCG, Trillium,
and the Experiments’ starting Efforts in this areaand the Experiments’ starting Efforts in this area
Self Discovering, CooperativeSelf Discovering, Cooperative Registered Services, Lookup Services; self-describingRegistered Services, Lookup Services; self-describing “ “Spaces” for Mobile Code and ParametersSpaces” for Mobile Code and Parameters
Scalable and Robust Scalable and Robust Multi-threaded: with a thread pool managing engineMulti-threaded: with a thread pool managing engine Loosely Coupled: errors in a thread don’t stop the task Loosely Coupled: errors in a thread don’t stop the task
Stateful: System State as well as task stateStateful: System State as well as task state Rich set of “problem” situations: implies Rich set of “problem” situations: implies Grid Views, Grid Views,
and and User/System DialoguesUser/System Dialogues on what to do on what to do For Example: Raise Priority (Burn Quota); or Redirect WorkFor Example: Raise Priority (Burn Quota); or Redirect Work
Eventually may be increasingly automated asEventually may be increasingly automated as we scale up and gain experience we scale up and gain experience
Managed; to deal with a Complex Execution EnvironmentManaged; to deal with a Complex Execution Environment Real time higher level supervisory services monitor, Real time higher level supervisory services monitor, track, optimize and Revive/Restart services as needed track, optimize and Revive/Restart services as needed
Policy and strategy-driven; Self-Evaluating and OptimizingPolicy and strategy-driven; Self-Evaluating and Optimizing Investable with increasing intelligenceInvestable with increasing intelligence
Agent Based; Evolutionary Learning AlgorithmsAgent Based; Evolutionary Learning Algorithms
HENP Grids: Services Architecture HENP Grids: Services Architecture Design for a Global SystemDesign for a Global System
Work on Computing Model (Essential) in ParallelWork on Computing Model (Essential) in Parallel Focus on a Few Scenarios for Doing AnalysisFocus on a Few Scenarios for Doing Analysis
“ “Grid Enabled PROOF” [in CMS; in ATLAS]Grid Enabled PROOF” [in CMS; in ATLAS] Start with Existing Analysis Applications: Start with Existing Analysis Applications:
Can they be recast in GAE Form ? Can they be recast in GAE Form ? Make Some Starting AssumptionsMake Some Starting Assumptions
Need some simple picture of persistencyNeed some simple picture of persistency Supplementary considerations:Supplementary considerations:
Multiuser situation (e.g. with avatars; then Analysis Challenges)Multiuser situation (e.g. with avatars; then Analysis Challenges) Coming to a few Either/Or DecisionsComing to a few Either/Or Decisions
List of rudimentary analysis tools, and way of workingList of rudimentary analysis tools, and way of working ““External” to the application considerations:External” to the application considerations:
Job planningJob planning Key role of query estimation (not only beforehand)Key role of query estimation (not only beforehand) Transparency versus trackingTransparency versus tracking
Getting Started Towards a Workable Getting Started Towards a Workable GAE (1)GAE (1)
Session or Sessions on the DesktopSession or Sessions on the Desktop There Modes of Working; All in the GAEThere Modes of Working; All in the GAE
Immediate (within a few seconds)Immediate (within a few seconds) In the background (seconds to a few minutes)In the background (seconds to a few minutes) Spawn batch job or jobs (minutes to hours)Spawn batch job or jobs (minutes to hours)
Decisions and tradeoffsDecisions and tradeoffs Lay out the strategies and consequences (time, quota etc)Lay out the strategies and consequences (time, quota etc) Present ChoicesPresent Choices Monitor progress or get “alarms” and be preparedMonitor progress or get “alarms” and be prepared
to re-strategize to re-strategize
Getting Started Towards a Workable Getting Started Towards a Workable GAE (2) GAE (2)
Smart Caching: Or Methods, of Data, or Time to Process Info.Smart Caching: Or Methods, of Data, or Time to Process Info. Intelligence in the system does not only mean problemIntelligence in the system does not only mean problem
solving solving Need to apply intelligence/experience to progressively improveNeed to apply intelligence/experience to progressively improve
system performance system performance Time-to-completion estimation: process a small amount ofTime-to-completion estimation: process a small amount of
data to get a realistic first estimate. data to get a realistic first estimate.
Getting Started Towards a Workable Getting Started Towards a Workable GAE (3) GAE (3)
These Slides Focus on Simulation/Prototyping, These Slides Focus on Simulation/Prototyping, as an Integral part of designing and building distributed systems for as an Integral part of designing and building distributed systems for the GAE, and the Grid-Enabled Production Environment (GPE) as the GAE, and the Grid-Enabled Production Environment (GPE) as well. well.
3 Slides About Building a Computing 3 Slides About Building a Computing Model & the GAE System Model & the GAE System
Generate a Blueprint: A “Computing Model”Generate a Blueprint: A “Computing Model”Tasks Tasks Workload, Facilities, Priorities & GOALS Workload, Facilities, Priorities & GOALS Persistency; Modes of Accessing Data (e.g. Object Collections)Persistency; Modes of Accessing Data (e.g. Object Collections) What runs where; when to redirectWhat runs where; when to redirect The User’s Working EnvironmentThe User’s Working Environment
What is normal (managing expectations) ?What is normal (managing expectations) ? Guidelines for dealing with problems: Guidelines for dealing with problems: based on which information ? based on which information ?
Performance and problem reporting/tracking/handling ?Performance and problem reporting/tracking/handling ? Known Problems: Strategies to deal with thoseKnown Problems: Strategies to deal with those
Set up, code a Simulation of the ModelSet up, code a Simulation of the Model Develop mechanisms and sub-models as neededDevelop mechanisms and sub-models as needed
Set up prototypes to measure the performance parameters Set up prototypes to measure the performance parameters where not already known to sufficient precisionwhere not already known to sufficient precision
Building a Computing ModelBuilding a Computing Modeland an Analysis Strategy (I)and an Analysis Strategy (I)
Run simulations (avatars for “actors”; agents; tasks; mechanisms)Run simulations (avatars for “actors”; agents; tasks; mechanisms) Analyze and evaluate performanceAnalyze and evaluate performance
General performance (throughput; turnaround)General performance (throughput; turnaround) Ensure “all” work is done: learn how to do this: within a Ensure “all” work is done: learn how to do this: within a reasonable time; compatible with the Collaboration’s guidelinesreasonable time; compatible with the Collaboration’s guidelines
Vary Model to Improve PerformanceVary Model to Improve Performance Deal with bottlenecks and other problemsDeal with bottlenecks and other problems New strategies and/or mechanisms to manage workflowNew strategies and/or mechanisms to manage workflow Represent key features and behaviors, for example:Represent key features and behaviors, for example:
Responses to Link or Site failuresResponses to Link or Site failures User input to redirect data or jobsUser input to redirect data or jobs Monitoring information gathering Monitoring information gathering Monitoring and management agent actions and Monitoring and management agent actions and behaviors in a variety of situations behaviors in a variety of situations
Validate the ModelValidate the Model Using Dedicated setupsUsing Dedicated setups Using Data Challenges (measure, evaluate, compare; fix key items)Using Data Challenges (measure, evaluate, compare; fix key items) Learn of new factors and/or behaviors to take into accountLearn of new factors and/or behaviors to take into account
Building a Computing ModelBuilding a Computing Modeland an Analysis Strategy (II)and an Analysis Strategy (II)
MAJOR MilestoneMAJOR Milestone: Obtain a first picture of a Model that : Obtain a first picture of a Model that Seems to Work Seems to Work
This may or may not involve changes in the computing resource This may or may not involve changes in the computing resource requirements-estimates; or Collaboration policies and expectationsrequirements-estimates; or Collaboration policies and expectations It is hard to estimate how long it will take to It is hard to estimate how long it will take to reach this milestone reach this milestone [most experiments until now have reached it [most experiments until now have reached it after the start of data taking] after the start of data taking]
Evolve the Model to Evolve the Model to Distinguish what works and what does notDistinguish what works and what does not Incorporate evolving site hardware and network performanceIncorporate evolving site hardware and network performance Progressively incorporate new and “better” strategies, to Progressively incorporate new and “better” strategies, to improve throughput and/or turnarounds, or fix critical problems improve throughput and/or turnarounds, or fix critical problems Take into account experience with the actual software-system Take into account experience with the actual software-system components as they developcomponents as they develop
In parallel with the Model evolution keep developing the overallIn parallel with the Model evolution keep developing the overall data analysis + Grid + monitoring “system”; represent it in the data analysis + Grid + monitoring “system”; represent it in the simulation simulation
And the associated strategiesAnd the associated strategies
Building a Computing ModelBuilding a Computing Modeland an Analysis Strategy (III)and an Analysis Strategy (III)