Upload
asis
View
67
Download
0
Embed Size (px)
DESCRIPTION
MONARC 2 - distributed systems simulation -. The Goals of the Project. To perform realistic simulation and modelling of large scale distributed computing systems, customised for specific large scale HEP applications. - PowerPoint PPT Presentation
Citation preview
POLITEHNICA POLITEHNICA
University of BucharestUniversity of Bucharest
California California
Institute of TechnologyInstitute of Technology
National Center for Information TechnologyNational Center for Information Technology
Ciprian Mihai DobreCiprian Mihai Dobre
Corina StratanCorina Stratan
MONARC 2- distributed systems simulation
-
The Goals of the ProjectThe Goals of the Project
• To perform realistic simulation and modelling of large scale distributed computing systems, customised for specific large scale HEP applications.
• To provide a design framework to evaluate the performance of a range of possible computer systems, as measured by their ability to provide the physicists with the requested data in the required time, and to optimise the cost.
• To narrow down a region in this parameter space in which viable models can be chosen by any of the future LHC-era experiments.
• To offer a dynamic and flexible simulation environment.
online systemmulti-level trigger
filter out backgroundreduce data volume
level 1 - special hardware
40 MHz (40 TB/sec)level 2 - embedded processorslevel 3 - PCs
75 KHz (75 GB/sec)5 KHz (5 GB/sec)100 Hz(100-1000 MB/sec)
data processing offline analysis, selection
One of the four LHC detectors (CMS)
Raw recording rate 0.1 – 1 GB/sec3 - 8 PetaBytes / year
LHC Computing: Different from LHC Computing: Different from Previous Experiment GenerationsPrevious Experiment Generations
Geographical dispersion:Geographical dispersion: of people and resources of people and resources Complexity:Complexity: the detector and the LHC environment; the detector and the LHC environment; Scale:Scale: ~100 times more processing power; Petabytes per year of data ~100 times more processing power; Petabytes per year of data
1800 Physicists 150 Institutes 32 Countries
VERY LARGE SCALE DISTRIBUTED SYSTEM AND IT HAS TO PROVIDE (NEAR) REAL-TIME DATA ACCESS FOR ALL THE PARTICIPANTS
CMS
Off-Line LHC ComputingOff-Line LHC ComputingData AnalysisData Analysis
Tier2 Center
Online System
Offline Farm,CERN Computer
France Center
FNAL Center Italy Center UK Center
InstituteInstituteInstituteInstitute ~0.25TIPS
Workstations
100–1000 MBytes/sec
~2.4 Gbits/sec
100 - 1000
Mbits/sec
Bunch crossing per 25 nsecs.Event is ~1 MByte in size
Physicists work on analysis “channels”.
Processing power: ~200,000 of today’s fastest PCs
Physics data cache
~PBytes/sec
~0.6 - 2.5 Gbits/sec
Tier2 CenterTier2 CenterTier2 Center
~622 Mbits/sec
Tier 0 +1
Tier 1
Tier 3
Tier 4
Tier2 Center Tier 2
Experiment
Regional Center Hierarchy Regional Center Hierarchy (Worldwide Data Grid)(Worldwide Data Grid)
The simulation model: abstracts the components of the real system and their
interactions must be equivalent to the simulated system
Simulation models: continuous time - the system is described by a set of
differential equations discrete time - the state changes only at certain time
moments In MONARC: one of the discrete time models (Discrete
Event Simulation – DES); the events represent important activities from the system, managed with the aid of an internal clock
Simulation ModelsSimulation Models
A Global View for ModellingA Global View for Modelling
Simulation Engine
Basic Components
Specific Components
Computing Models
LAN WAN
DB CPU
Scheduler Job
Catalog
Analysis
Distributed Scheduler
MetaDataJobs
MONITORING
REAL Systems Testbeds
Regional Center ModelRegional Center Model
JobJobJob
Activity Activity Activity
Job Scheduler
AJob AJobAJobCPU
...LinkPort
AJob AJobAJobCPU
...LinkPort
AJob AJobAJobCPU
...LinkPort
DB
Index
DBServer
LinkPort
DBServer
LinkPort
FARM
REGIONAL CENTER
LAN
WAN
The Simulation EngineThe Simulation Engine
Provides the multithreading mechanism for the simulation The entities with time dependent behavior are mapped on
“active objects” In the simulation engine: management of active objects and
events Thread reusability (thread pool)
Scheduler
Task Event EventQueue
WorkerThread Pool
Activity
JobScheduler
Farm
CPUUnit
AJobJob
Engine
Multitasking Processing ModelMultitasking Processing Model
Concurrent running tasks share resources (CPU, memory, I/O)
“Interrupt” driven scheme: For each new task or when one task is finished, an interrupt is
generated and all “processing times” are recomputed.
It provides:
Handling of concurrent jobs with different priorities.
An efficient mechanism to simulate multitask processing.
An easy way to apply different load balancingschemes.
Engine testsEngine tests
Processing a TOTAL of 100 000 simple jobs in 1 , 10, 100, 1000, 2 000 , 4 000, 10 000 CPUs (number of CPUs = number of parallel threads):
1
10
100
1000
10000
10 100 1000 10000 100000
No of THREADS
Tim
e [
s]
2X2.4 GHz, Linux
2X450MHz Solaris
2X3GHz, Windows
more tests: http://monalisa.cacr.caltech.edu/MONARC/
Job SchedulingJob Scheduling
Dynamically loadable modules for each regional center
Basic job scheduler: assigns the jobs to CPUs from the local farm
More complex schedulers: allow job migration between regional centers
CPU FARM
JobScheduler
Site A
Dynamically loadable module
Centralized SchedulingCentralized Scheduling
CPU FARM
JobScheduler
Site A
CPU FARM
JobScheduler
Site B
GLOBAL
Job Scheduler
Distributed Scheduling Distributed Scheduling – – market model –market model –
CPU FARM
JobScheduler
Site A
CPU FARM
JobScheduler
Site B
CPU FARM
JobScheduler
Site A
Request
COST
DECISION
Example: simple distributed schedulingExample: simple distributed scheduling
Very simple scheduling algorithm, based on searching the center with the minimum load
We simulated the activity of 4 regional centers
When all the centers are heavily loaded, the number of job transfers grows unnecessarily
Network ModelNetwork Model
WAN
WAN
WAN
WAN
LANLAN LANLAN
LinkPortLinkPort
Farm Farm
Simulated local trafficSimulated inter-regional traffic
Simulated networkcomponents
Node Link
Node
Node
LANNode
Link
Node
Node
LAN
Node Link
Node
Node
LAN
Internet Connections
ROUTER
ROUTER“Interrupt” driven simulation : for each new message an interrupt is created and for all the active transfers the speed and the estimated time to complete the transfer are recalculated.
Continuous Flow between events !An efficient and realistic way to simulate concurrent transfers
having different sizes / protocols.
LAN/WAN Simulation Model
Network ModelNetwork Model
Network AccessLayer
Internet Layer
Transport Layer
Application Layer
MessageLinkPort, LAN,WAN
Protocol:TCPProtocolUDPProtocol
NetworkJob
The TCP/IP layers are closely followed
Data ModelData Model
Client Database Index
LinkPort DatabaseDatabase
Database
DContainer
DContainer
DContainerDatabase Server Mass Storage
Mapare
Task Database Entity
Data ModelData Model
Generic Data Container
Size Event Type Event Range Access Count INSTANCE
FTP ServerNode
DB Server NFS Server
FILE Data Base
Custom Data Server
NetworkFILE
META DATA CatalogReplication Catalog
Export / Import
Data ModelData Model
Data Container
JOB
META DATA CatalogReplication Catalog
Data Request
Data Container
Data Container
Data Container
List Of IO Transactions
Data Processing JOB
Select from the options
Activities: Arrival PatternsActivities: Arrival Patterns
A flexible mechanism to define the Stochastic process of how users perform data processing tasks
Dynamic loading of “Activity” tasks, which are threaded objects and are controlled by the simulation scheduling mechanism
Physics ActivitiesInjecting “Jobs”
Each “Activity” thread generates data processing jobs
for( int k =0; k< jobs_per_group; k++) { Job job = new Job( this, Job.ANALYSIS, "TAG”, 1, events_to_process); farm.addJob(job ); // submit the job sim_hold ( 1000 ); // wait 1000 s }
Regional Centre Farm
Job
Activity
Job
Job
Activity
These dynamic objects are used to model the users behavior
Output of the simulationOutput of the simulation
Simulation Engine
Node
DB
Router
User C
Output Listener Filters
Output Listener Filters
Log Files EXCEL
GRAPHICS
Any component in the system can generate generic results objects Any client can subscribe with a filter and will receive the results it is Interested in .VERY SIMILAR structure as in MonALISA . We will integrate soon The output of the simulation framework into MonaLISA
ConclusionsConclusions
http://monalisa.cacr.caltech.edu/MONARC
Modelling and understanding current systems, their performance and limitations, is essential for the design of the large scale distributed processing systems. This will require continuous iterations between modelling and monitoring
Simulation and Modelling tools must provide the functionality to help in designing complex systems and evaluate different strategies and algorithms for the decision making units and the data flow management.
For future development: efficient distributed scheduling algorithms, data replication, more complex examples.