Upload
stella-wheeler
View
212
Download
0
Embed Size (px)
Citation preview
The ATLAS Cloud Model
Simone Campana
LCG sites and ATLAS sites
• LCG counts almost 200 sites.– Almost all of them support the ATLAS VO. – The ATLAS production software is installed in roughly 80 sites
• Not all those sites are ATLAS T1s and T2s.– Being an ATLAS T1 or T2 means to provide a certain amount of
resources• Storage + CPU• Defined in the Memorandum of Understanding
• ATLAS counts 10 T1s – CNAF, PIC, SARA, RAL, BNL, TRIUMF, SINICA, LYON, FZK,
NorduGrid– And of course there is the CERN T0.
• Each T1 should support between 3 and 6 T2s • Other tiers are “Opportunistic Resources”
The cloud Model
• ATLAS sites have been divided in CLOUDS– At the moment consider only ATLAS T1 and T2
• Will introduce opportunistic later on
• Currently there are therefore 9 clouds – T0 (CERN), IT (Italy), ES (Spain), UK (United
Kingdom), FR (France), CA (Canada), TW (Taiwan), DE (Germany), NL (Nederland)
• Every TASK in the ATLAS Production System is assigned to a specific cloud – Jobs run in one of the Cloud CE – Input Data are fetched from one of the cloud SEs– Output Data are stored in one of the cloud SEs
IT Cloud
SESE
CECE
MILANOMILANO
SESE
CECE
ROMA1ROMA1
SESE
CECE
NAPOLINAPOLI
SESE
CECE
LNFLNF
LOCALLOCALLFCLFC
FTSFTS
SESE CECE CNAF (T1)CNAF (T1)
VOBOXVOBOX
Job Submission
• ATLAS production jobs can be of different types – Mostly EVGEN, SIMUL+DIGIT, RECO
• Each part of the chain produces inputs for the subsequent one– Outputs of the EVGEN are input of the SIMUL etc .. – EVGENS generally have no input
• Each task is assigned to a cloud accordingly to the location of its input – If EVGEN run on UK cloud, the corresponding SIMUL
will most likely run on the same cloud
Data Management at runtime
• Every file of the cloud is registered in the File Catalog (LFC) of the T1 – The job locates the input file using LFC at the
T1– The job copies the input file locally in the WN
from a SE of the cloud– The job stores the output in one SE of the
cloud – The job registers the output in the LFC of the
cloud
Asynchronous Data Management
• DQ2 is installed on the VOBOX of every T1• DQ2 allows to
Move files from T2 of the cloud to T1 of the cloud (and vice-versa)
• Using T1 FTS
Move files to the T1 from another T1• using T1 FTS
Move files to the T1 from any T2 (in other clouds)• Using T1 FTS
Move files to a T2 of the cloud from any tier (of the cloud)
• Using T1 FTS
Move files from T0 to T1 • Using T0 FTS
FTS channels at CNAF
CNAF-MILANOMILANO-CNAFCNAF-ROMAROMA-CNAFCNAF-LNFLNF-CNAFCNAF-NAPOLINAPOLI-CNAF
LYON-CNAFRAL-CNAFSINICA-ROMA(otherT1)-CNAF* - CNAF* - MILANO* - ROMA* - NAPOLI* - LNF
Usage of DQ2
• Before assigning the job to the cloud– Might need to replicate input file – Cloud containing the input file might be too busy– This is NOT an operator issue
• When a job is assigned to the cloud– The output dataset is subscribed to the T1– Files are migrated to the T1 as soon as they appear
• They will end up in the DISK-only area of the T1 SE– Currently, if this subscription fails, a manual operation
is needed• Unsubscribe and re-subscribe again
– This is an issue for the Data Management operator
TiersOfAtlas
• The definition of the ATLAS clouds comes in the TiersOfAtlas.py file– It is under $DQ2DIR/common/
• TiersOfAtlas.py contains the API to retrieve cloud information– Name of LFC, name of the SEs …
• The actual topology (database) is in TiersOfAtlasCache.py– Also under $DQ2DIR/common– The API checks automatically for updates of the
cache and downloads the latest version.
TiersOfAtlas wrapper
• ToA is not a client tool – If you want to know the IT LFC, you can grep
for it in the cache or write a small python script
– A more user friendly client is being written• Will be there very soon.