26
20.10.2011 1 Mihnea Dulea, IFIN-HH Efficient Handling and Processing of PetaByte-Scale Data for the Grid Centers within the FR Cloud 1ST JOINT SYMPOSIUM CEA-IFA HaPPSDaG - PROJECT PRESENTATION - - FIRST YEAR PROGRESS REPORT - M. Dulea National Institute for Nuclear Physics and Engineering 'Horia Hulubei' (IFIN-HH)

Mihnea Dulea, IFIN-HH

  • Upload
    clea

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Efficient Ha ndling and P rocessing of P etaByte- S cale Da ta for the G rid Centers within the FR Cloud 1ST JOINT SYMPOSIUM CEA-IFA. HaPPSDaG - PROJECT PRESENTATION - - FIRST YEAR PROGRESS REPORT -. M. Dulea National Institute for Nuclear Physics and Engineering - PowerPoint PPT Presentation

Citation preview

Page 1: Mihnea Dulea, IFIN-HH

20.10.2011 1Mihnea Dulea, IFIN-HH

Efficient Handling and Processing ofPetaByte-Scale Data for the Grid Centers

within the FR Cloud

1ST JOINT SYMPOSIUM CEA-IFA

HaPPSDaG

- PROJECT PRESENTATION -

- FIRST YEAR PROGRESS REPORT -

M. Dulea

National Institute for Nuclear Physics and Engineering 'Horia Hulubei' (IFIN-HH)

Page 2: Mihnea Dulea, IFIN-HH

20.10.2011 2Mihnea Dulea, IFIN-HH

Computing support for LHC

Project topics

Project objectives and work planning

Framework agreements

General information

Project teams and infrastructure

First year results

OVERVIEW

Page 3: Mihnea Dulea, IFIN-HH

20.10.2011 3Mihnea Dulea, IFIN-HH

COMPUTING SUPPORT for LHC - LCG

LHC COMPUTING GRID

LCG is a wide distributed array of computing resources that provides the computing support required for the storage, processing, simulation and analysis of the data gathered by the four major experiments performed at LHC.

It consists of more than 140 computing centres and federations of centres from 35 countries.

The resource centres are classified accordingto their size and functionality as Tier-0 (CC@ CERN), Tier-1 (11 centres), and Tier-2.

The centres are interconnected through a high-speed network (GEANT2 in EU).

Current and 2012-2014 activity related to LHC.

Page 4: Mihnea Dulea, IFIN-HH

20.10.2011 4Mihnea Dulea, IFIN-HH

COMPUTING SUPPORT - FR

ATLAS FRENCH CLOUD

Grid sites:

CC-IN2P3 (Tier-1)

Tier-2 centres: ... (many)

GRIF

Grille de Recherche d'Ile de Francecomputing grid in Paris region, jointinitiative of CEA/IRFU + labs ofCNRS/IN2P3 (6 sites)

The sites are interconnected through dedicated 10 Gbps links; connected to the FR NREN:

RENATER: Réseau national de télécommunications pour la technologie, l'enseignement et la recherche

FR Cloud includes foreign grid centres from China, Japan, Romania

Page 5: Mihnea Dulea, IFIN-HH

20.10.2011 5Mihnea Dulea, IFIN-HH

COMPUTING SUPPORT - RO

ROMANIAN TIER-2 FEDERATION RO-LCG

Grid sites:

IFIN-HH, 5 Grid sites (resource centres)ISS - Inst. for Space Sciences (2 sites)UPB - Univ. 'Politehnica' of Bucharest ITIM - NIRD in Molecular & Isotopic Technologies - ClujUAIC, Alex. Ioan Cuza University - Iasi

The sites are connected to the 10 Gbpsbackbone of the RO NREN - the Romanian Educational and Research Network RoEduNet

4 grid sites currently support ATLAS vo: RO-07-NIPNE, RO-02-NIPNE (IFIN-HH);

RO-14-ITIM (Cluj), RO-16-UAIC (Iasi)

Page 6: Mihnea Dulea, IFIN-HH

20.10.2011 6Mihnea Dulea, IFIN-HH

PROJECT TOPIC

Computing support for LHC experiments = provision of grid resources + services

The overall support of LCG

deployment and operation is

provided from other funds (e.g.

CONDEGRID project in RO).

HAPPSDAG addressess specific ATLAS issues

in order to optimize resource usage

Page 7: Mihnea Dulea, IFIN-HH

20.10.2011 7Mihnea Dulea, IFIN-HH

ATLAS ISSUES

Generic requirements regarding

- data transfer from Tier-1 to the associated Tier-2 sites (CC-IN2P3 => RO-LCG)

- transfer of large files from SE to WN for each analysis job; consider many simultaneous jobs

- transfer of log and results files from WN to SE; immediate transfer of log file to UI

RO specific needs at the beginning of the project Grid cluster

- analysis of the causes of the lower performance of

RO-LCG sites before Oct. 2010

- elaborate and test technical solutions for performance

improvement

- ensure better communication and coordination

between the RO sites and the FR-cloud partners

- general measures for improving Tier1 - Tier2 interaction

- elaborate general guidelines regarding the

improvement in efficiency of the grid centers which are

associated to ATLAS clouds Transfer paths from/to the Storage Element (SE)

Page 8: Mihnea Dulea, IFIN-HH

15.11.2010 8Mihnea Dulea, IFIN-HH

Improve communication and coordination between GRIF/IN2P3 and RO sites (RO/FR)

Testing & improving quality of the FR - RO data link for large dataset transfers (RO/FR)

Implementation of specific measures for increasing ATLAS job load and storage

performance on sites (RO) Improving large dataset transfer between FR - RO and data analysis (RO/FR)

Contributing to grid monitoring and technical support within FR-cloud (RO)

Training regarding grid monitoring and support (FR => RO)

Dissemination (RO/FR)

Strategic objective: provide means for improvement of the processing and handling of large data sets at the Tier2 centers which participate in the ATLAS experiment at the LHC computing support. (RO - case study)

Specific objectives and partner contributions:

PROJECT OBJECTIVES

Page 9: Mihnea Dulea, IFIN-HH

20.10.2011 9Mihnea Dulea, IFIN-HH

PLANNING of WORK

Stage 1 (01.10.2010 - 10.12.2010)

Analysis of Tier1-Tier2 communication

Stage 2 (01.01.2011 - 30.09.2011)

Studies and software tools for monitoring and operation of the FR Cloud - RO grid connection and job loading. Testing of data handling and processing.

Stage 3 (01.10.2011 - 30.09.2012)

Methods and procedures for improving the performance of the RO sites within the FR Cloud

Page 10: Mihnea Dulea, IFIN-HH

20.10.2011 10Mihnea Dulea, IFIN-HH

General Cooperation Agreement for Scientific Research

between CEA and IFA, signed in December 2009

- Field of cooperation: Technologies for Information and Health

- Topic proposed for 2010: Grid Technologies

Joint Call for proposals of joint R&D projects (May 2010)

- IFIN-HH and IRFU submitted a proposal for a Joint Research and

Development Projects

Cooperation Agreement in the Field of Scientific Research (AS)

between CEA and IFIN-HH, (01.10.2010)

- General Coordinators: Gerard Cognet (FR), Ioan Ursu (RO)

- leading and coordinating the cooperation activities

Project Agreement (CEA, IFIN-HH)

FRAMEWORK AGREEMENTS

Page 11: Mihnea Dulea, IFIN-HH

20.10.2011 11Mihnea Dulea, IFIN-HH

RO Contract n° C1-06/2010, between IFA and IFIN-HH Start date: 01/10/2010 Duration: 24 months Funding of the RO part of the project: 400 000 lei (~ 94.000 €) Funding of the FR part of the project: 133 000 €

GENERAL INFORMATION

BUDGET 2010 2011 2012RO (lei) CEA (Eur) RO (lei) CEA (Eur) RO (lei) CEA (Eur)

Manpower 25.333 6000 120.133 48000 82.000 22000Travels 8.000 4000 3.200 14000 8.000 14000Others (Romanian Engineer staying at Saclay )

5000 10000 10000

Others (French guests staying in Romania )

0 10.000 10.000

Others (equipment) 0 40.000 40.000Others (indirect costs) 6.667 26.667 20.000

Total: 40.000 15.000 200.000 72.000 160.000 46.000

Page 12: Mihnea Dulea, IFIN-HH

20.10.2011 12Mihnea Dulea, IFIN-HH

PROJECT TEAMS

Project coordinators: Jean-Pierre Meyer (FR), Mihnea Dulea (RO)

Technical correspondents: Pierrick Micout (FR), Gabriel Stoicea (RO)

FR team (CEA/IRFU)Eric LANÇONPierrick MICOUTChristine LEROYFrédéric SCHAERZoulikha GEORGETTEAdelino GOMEZ

RO team (IFIN-HH)

Serban Constantinescu

Mihai Ciubancan

Ionut Traian Vasile

Camelia Mihaela Visan

Page 13: Mihnea Dulea, IFIN-HH

INFRASTRUCTURE @ CTI/DPETI1200 (grid) + 960 (hpc) cores, 270 TB

Centre for Informational Technologies (CTI) - IFIN-HH

Page 14: Mihnea Dulea, IFIN-HH

20.10.2011 14Mihnea Dulea, IFIN-HH

ANALYSIS of NETWORK INFRASTRUCTURE

Objective: identify the weak points of the FR-RO data connection and adoption of measures for improving the transfer capacity of large datasets.

Network structure: complex, various owners and administrators => more difficult to act

Activities (RO+FR) Testing connectivity & transport capacity with various tools Finding routing paths and points of data traffic delay Comparing performances of RO-CERN link with those of RO-IN2P3

Section Centres Administrator Owner LocationIFIN-HH LAN RO-02-NIPNE

RO-07-NIPNECTI/DPETI IFIN-HH Magurele

IFIN - UPB UPB ICOMM IFIN-HH UPBRoEduNet RO-14-ITIM

RO-16-UAICAARNIEC MECTS Romania

GEANT2 In 34 EU states

DANTE EU NRENs EU

RENATER GRIF, IN2P3 GIP RENATER GIP RENATER

France

Conclusions: a) performance degradation at RoEduNet / GEANT2 interface

b) bottlenecks on some of the RoEduNet routers

Page 15: Mihnea Dulea, IFIN-HH

20.10.2011 15Mihnea Dulea, IFIN-HH

IMPROVING POINT-TO-POINT TRAFFIC PERFORMANCES

Requires close collaboration with network administrators along the RO-FR path

Example: following bandwidth capacity and traffic analysis, a RoEduNet router was found, responsible of bottlneck. AARNIEC's intervention rised the available bandwidth to 700 Mbps (fig. below).

Permanent monitoring required

Page 16: Mihnea Dulea, IFIN-HH

20.10.2011 16Mihnea Dulea, IFIN-HH

MONITORING TOOLS forDATA TRANSFER and STORAGE PERFORMANCE - 1

Development of software tools for monitoring of SE traffic (in/out) (adding data sent by daemons running on storage servers in a database + web interface for display)

Tools developed in IFIN-HH; useful for FR partners too for monitoring RO sites.

Traffic from/to WNs and from/to external network

Max at 5 Gbps Max at > 3 Gbps

Page 17: Mihnea Dulea, IFIN-HH

20.10.2011 17Mihnea Dulea, IFIN-HH

MONITORING TOOLS forDATA TRANSFER and STORAGE PERFORMANCE - 2

Traffic on gateway (in/out); SE extern throughput

Monitoring groups of running or pending jobs

Page 18: Mihnea Dulea, IFIN-HH

20.10.2011 18Mihnea Dulea, IFIN-HH

MONITORING TOOLS forDATA TRANSFER and STORAGE PERFORMANCE - 3

Accounting of running or pending jobs on CE or CREAM-CE

Page 19: Mihnea Dulea, IFIN-HH

20.10.2011 19Mihnea Dulea, IFIN-HH

IMPROVEMENT of SITE MONITORING and TECHNICAL SUPPORT

Implementation of its own SAM (Service Availability Monitoring) system, that uses IFIN-HH grid infrastructure and a new monitoring vo - ifops. Results published using Nagios.

Early notification of technical staff leads to improvement of availability of grid services

Monitoring of CREAM-CE, tbit03.nipne.ro

Page 20: Mihnea Dulea, IFIN-HH

20.10.2011 20Mihnea Dulea, IFIN-HH

IMPROVEMENT and TESTS of SE-WN THROUGHPUT

Adding more resources (WNs) doesn't always mean better results. Scalability is required

Improvement of file transfer speed from SE to WN, required by analysis jobs (4-6 files 2-4 GB)

Replacing the transfer to disk servers through Network File System (NFS) protocole by new DPM (Disk Pool Manager) disk storage servers.

Higher transfer speed => no job exceeds the time limit => no cancellation

Tests of the new configuration

Time representation of the transfer speed (in Mbps) for 70 quasi-simultaneous jobs

Page 21: Mihnea Dulea, IFIN-HH

20.10.2011 21Mihnea Dulea, IFIN-HH

GLOBAL IMPROVEMENT of EFFICIENCY

Mean efficiency of ATLAS job execution in 2011: 91%

Monthly number of ATLAS jobs and number of ATLAS events processed in RO-LCG

Page 22: Mihnea Dulea, IFIN-HH

20.10.2011 22Mihnea Dulea, IFIN-HH

TRAINING REGARDING MONITORING AND TECHNICAL SUPPORT

20.06.11 - 04.07.11: training stage of C. Visan at CEA/IRFU, preparing later participation to monitoring and support activities for FR Cloud sites.

Topics:

- CEA/IRFU monitoring methods at site, VO, project levels; EGI/WLCG and LHC

monitoring (Christine Leroy, Pierrick Micout )

- grid site usage (Georgette Zoulikha)

- NAGIOS installing/configuration on virtual machines (Frederic Schaer)

- job submission through Pathena (PanDA Athena), at LAL-Orsay (Laurent Duflot)

- CACTI site monitoring (Victor Mendoza, Université Pierre et Marie Curie (UPMC))

- instructions for site and job monitoring in ADCoS (ATLAS Distributed Computing

Operations Shift) and for support team of FR Cloud (Squad). (Sabine Crepe)

Page 23: Mihnea Dulea, IFIN-HH

20.10.2011 23Mihnea Dulea, IFIN-HH

MOBILITY

Kick-off meeting (15-16.11 2010, Saclay)

Participation at the RO-LCG 2010 Conference, Bucharest (Christine Leroy, Sabine

Crepe - IN2P3)

Participation of Gabriel Stoicea to the spring meeting of LCG-France (30-31.05.2011)

Training - monitoring and support (20.06.11 - 04.07.11, Saclay), C.M. Visan

Page 24: Mihnea Dulea, IFIN-HH

20.10.2011 24Mihnea Dulea, IFIN-HH

BENEFITS

CEA/IRFU

The results of the project contribute to global improvement of FR Cloud efficiency

Elaboration, in collaboration, of general guidelines for interaction between grid centres

in ATLAS clouds, and

Using FR-RO interaction as a representative case study for sharing best practices with

smaller sites

IFIN-HH

General efficiency improvement of the activity of the RO sites

Better integration and visibility in the framework of the computing support for ATLAS

collaboration

High-level training of RO technical staff

Page 25: Mihnea Dulea, IFIN-HH

20.10.2011 25Mihnea Dulea, IFIN-HH

PROSPECTS

Further development of methods and procedures for improving the performance of the RO sites within the FR Cloud

General guidelines regarding the improvement in efficiency of the grid centers which are associated to ATLAS clouds

HAPPSDAG workshop and technical meeting in Bucharest (28-30.11.2011)

Participation of IFIN-HH to site and job monitoring in ADC shifts (ATLAS Distributed Computing) or in the monitoring team of FR Cloud.

Dissemination of results

Page 26: Mihnea Dulea, IFIN-HH

20.10.2011 26Mihnea Dulea, IFIN-HH

THANK YOU FOR YOUR ATTENTION !

Questions?