View
219
Download
0
Category
Tags:
Preview:
Citation preview
GridPP & The Grid
Who we are & what it is
Tony Doyle
Web: information sharing
• Invented at CERN by Tim Berners-Lee
No. of
Inte
rnet
host
s (m
illio
ns)
Year
• Agreed protocols: HTTP, HTML, URLs
• Anyone can access information and post their own
• Quickly crossed over into public use Tim
Berners-Lee
@Home Projects• Uses home PCs to run
numerous calculations with dozens of variables.
• Distributed computing project, not a grid
• Other @home projects– BBC Climate
Change ExperimentSETI @ Home
– FightAIDS@home
Peer To Peer Networks
Peer-to-peer network
• No centralised database of files• Legal problems with sharing
copyrighted material• Security problems
Grid: Resource Sharing
• Share more than information• Data, computing power, applications
MIDDLEWARE
CPUCluster
User Interface Machine
CPUCluster
Resource Broker
DiskServer
Your Program
Disks, CPU etc
PROGRAMS
OPERATING SYSTEM
Word/Excel
Email/Web
Your Program
Games
• Middleware handles everything
Single computer
The Grid
Analogy with the Electricity Power Grid
'Standard Interface'
Distribution Infrastructure
Power Stations
Computing and Data Centres
Fibre Optics of the Internet
The CERN LHC
4 Large Experiments
The world’s most powerful particle accelerator - 2007
ALICE- heavy ion collisions, to create quark-gluon plasmas
- 50,000 particles in each collision
LHCb- to study the differences between matter and antimatter
- will detect over 100 million b and b-bar mesons each year
ATLAS- General purpose- Origin of mass- Supersymmetry- 2,000 scientists from 34 countries
CMS- General purpose
- 1,800 scientists from over 150 institutes
“One Grid to Rule Them All”?
The Experiments
Why do particle physicists need the Grid?
Example from LHC: starting from this event…
…we are looking for this “signature”
Selectivity: 1 in 1013
Like looking for 1 person in a thousand world populations
Or for a needle in 20 million haystacks
Why do particle physicists need the
Grid?
Concorde(15 Km)
Mt. Blanc(4.8 Km)
One year’s data from LHC
would fill a stack of CDs 20km high • 100 million electronic
channels• 800 million proton-
proton interactions per second
• 0.0002 Higgs per second• 10 PBytes of data a year • (10 million GBytes
= 14 million CDs)
Who else can use a Grid?• Astronomers
• Healthcare Profesionals
• Bioinformatics
• Digital curation
To create digital Libraries and
Museums
Scanning
Remote consultanc
y
Optical
X ray
Digitize almost anything
19 UK Universities, CCLRC (RAL &
Daresbury) Funded by PPARCGridPP1 2001-2004
"From Web to Grid"
GridPP2 2004-2007 "From Prototype to Production"
Developed a working, highly functional Grid
Who are GridPP?
What Have We Done So Far
• Simulated 46 million molecules for medical research in 5 weeks, which would have taken over 80 years on a single PC
• Reached transfer speeds of 1 Gigabyte per second in high speed networking tests from CERN – a DVD every 5 seconds
• BaBar experiment has simulated 500 million particle physics collisions on the UK Grid
• UK’s #1 producer of data for LHCb, ATLAS and CMS
Worldwide LHC Computing Grid• GridPP is part
of EGEE and LCG (currently the largest Grid in the world)
EGEE stats:
182 Sites
42 Countries
38,201 CPUs
9,145 TBytes Storage
Tier Structure
Tier 0
Tier 1National centres
Tier 2Regional groups
Tier 3Institutes
Offline farm
Online system
CERN computer centre
RAL,UK
ScotGrid NorthGridSouthGrid London
ItalyUSA
Glasgow Edinburgh Durham
FranceGermany
Detector
UK Tier-1/A Centre• High quality data services• National and International
Role• UK focus for International Grid
development
•1000 Dual CPU
•200 TB Disk•220 TB Tape (Capacity 1PB)
Grid Operations Centre
UK Tier-2 CentresScotGridDurham, Edinburgh, Glasgow NorthGridDaresbury, Lancaster, Liverpool,Manchester, Sheffield
SouthGridBirmingham, Bristol, Cambridge,Oxford, RAL PPD, Warwick
LondonBrunel, Imperial, QMUL, RHUL, UCL
•Must •share data between thousands of scientists with multiple interests•link major and minor computer centres•ensure all data accessible anywhere, anytime•grow rapidly, yet remain reliable for more than a decade•cope with different management policies of different centres•ensure data security•be up and running routinely by 2007
What are the Grid challenges?
Other Grids• UK National Grid Service
– UK’s core production computational and data Grid
• EGEE (Europe)– Enabling Grids for E-
sciencE
• Nordugrid (Europe)– Grid Research and
Development collaboration
• Open Science Grid (USA)– Science applications from
HEP to biochemistry
The Future• Grow the LHC Grid
• Spread beyond science– Healthcare, commercial uses, government, games
• Will it become part of everyday life?
Further Info
http://www.gridpp.ac.uk
Backups
“UK contributes to EGEE's battle with malaria”
BioMedSuccesses/Day 1107Success % 77%
WISDOM (Wide In Silico Docking On Malaria)
The first biomedical data challenge for drug discovery, which ran on the EGEE grid production service from 11 July 2005 until 19 August 2005.
GridPP resources in the UK contributed ~100,000 kSI2k-hours from 9 sites
Number of Biomedical jobs processed by country
Normalised CPU hours contributed to thebiomedical VO for UK sites, July-August 2005
Is GridPP a Grid?
1. Coordinates resources that are not subject to centralized control
2. … using standard, open, general-purpose protocols and interfaces
3. … to deliver nontrivial qualities of service
1. YES. This is why development and maintenance of LCG is important.
2. YES. VDT (Globus/Condor-G) + EGEE(Glite) ~meet this requirement.
3. YES. LHC experiments data challenges over the summer of 2004.
http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf
http://agenda.cern.ch/fullAgenda.php?ida=a042133
Application Development
ATLAS LHCb CMS
BaBar (SLAC) SAMGrid (FermiLab)QCDGrid PhenoGrid
Middleware Development
Middleware Development
Configuration Management
Storage Interfaces
Network Monitoring
Security
Information Services
Grid Data Management
Requirement
Storage Element
Basic File Transfer
Reliable File Transfer
Catalogue Services
Data Management tools
Compute Element
Workload Management
VO Agents
VO Membership Services
DataBase Services
Posix-like I/O
Application Software Installation Tools
Job Monitoring
Reliable Messaging
Information System
15 Baseline Services for a functional Grid
We rely upon gLite components
This middleware builds upon VDT (Globus and Condor) and meets the requirements of all the basic scientific use cases:
1. Purple (amber) areas are (almost) agreed as part of the shared generic middleware stack by each of the application areas
2. Red are areas where generic middleware competes with application-specific software.
www.glite.org
gLite Middleware Stack
2005 Metrics and Quality Assurance
Target Current status
Q2 2006 Target values
Number of Users
~ 1000 ≥ 3000
Number of sites
120 50
Number of CPU
~12000 9500 at month 15
Number of Disciplines
6 ≥ 5
Multinational 24 ≥ 15 countries
LCG Service Challenges
SC2SC3
LHC Service OperationFull physics run
2005 20072006 2008
First physicsFirst beams
cosmics
June05 - Technical Design Report
Sep05 - SC3 Service Phase
May06 – SC4 Service Phase
Sep06 – Initial LHC Service in stable operation
SC4
SC2 – Reliable data transfer (disk-network-disk) – 5 Tier-1s, aggregate 500 MB/sec sustained at CERNSC3 – Reliable base service – most Tier-1s, some Tier-2s – basic experiment software chain – grid data throughput 500 MB/sec, including mass storage (~25% of the nominal final throughput for the proton period)SC4 – All Tier-1s, major Tier-2s – capable of supporting full experiment software chain inc. analysis – sustain nominal final grid data throughputLHC Service in Operation – September 2006 – ramp up to full operational capacity by April 2007 – capable of handling twice the nominal data throughput
Apr07 – LHC Service commissioned
Status?: Exec2 Summary
• 2005 was the first full year of a Production Grid: the UK Tier-1 was the largest CPU provider on the LCG and by the end of the year the Tier-2s provided twice the CPU of the Tier-1.
• The Production Grid is considered to be functional and hence the focus is now on improving performance of the system, especially w.r.t. data storage and management.
• The GridPP2 Project is now approaching halfway and has met 40% of its original targets with 91% of the metrics within specification.
Grid OverviewAim: by 2008 (full year’s data
taking)- CPU ~100MSi2k (100,000
CPUs)- Storage ~80PB - Involving >100 institutes
worldwide
- Build on complex middleware being developed in advanced Grid technology projects, both in Europe (Glite) and in the USA (VDT)
1. Prototype went live in September 2003 in 12 countries
2. Extensively tested by the LHC experiments in September 2004
Some of the challenges for 2006
• File transfers – Good initial progress– But some way still to go with testing - stressing reliability, performance– Can only be done with participation of experiments– Distribution to other sites being planned
• Distributed VO services– Plan agreed – T1 will sign off and then VO boxes may be deployed by
T2s– But still to deploy pilot services - ALICE ATLAS CMS LHCb
• End-to-end testing of the T0-T1-T2 chain– MC production, reconstruction, distribution
• Full Tier-1 work load testing – Recording, reprocessing, ESD distribution,
analysis, Tier-2 support• Understanding the “Analysis Facility”
– batch analysis @ T1 and T2– interactive analysis
• Startup scenarios– Schedule is known at high level and defined for Service Challenges –
testing time ahead (in many ways)
Data Processing
LEVEL-1 Trigger Hardwired processors (ASIC, FPGA) Pipelined massive parallel
HIGH LEVEL Triggers Farms of
processors
10-9 10-6 10-3 10-0 103
25ns 3µs hour yearms
Reconstruction&ANALYSIS TIER0/1/2
Centers
ON-lineOFF-line
sec
Giga Tera Petabit
9 or
ders
of
mag
nitu
de
Getting Started
http://ca.grid-support.ac.uk/
1. Get a digital certificate
2. Join a Virtual Organisation (VO) For LHC join LCG and choose a
VO
3. Get access to a local User Interface Machine (UI) and copy your files and certificate there
Authentication – who you are
http://lcg-registrar.cern.ch/
Authorisation – what you are allowed to do
Job Preparation
############# athena.jdl #################Executable = "athena.sh";StdOutput = "athena.out";StdError = "athena.err";InputSandbox = {"athena.sh", "MyJobOptions.py", "MyAlg.cxx", "MyAlg.h", "MyAlg_entries.cxx", "MyAlg_load.cxx", "login_requirements", "requirements", "Makefile"}; OutputSandbox = {"athena.out","athena.err", "ntuple.root", "histo.root", "CLIDDBout.txt"};Requirements = Member("VO-atlas-release-10.0.4", other.GlueHostApplicationSoftwareRunTimeEnvironment);################################################
Input files
Output Files
Choose ATLAS Version
Prepare a file of Job Description Language (JDL):
My C++ CodeJob Options
Script to run
Dep
loym
ent
Bo
ard
Tie
r1/T
ier2
,T
estb
eds,
Ro
llou
t
Ser
vice
spec
ific
atio
n&
pro
visi
on
Use
r B
oar
d
Req
uir
emen
ts
Ap
plic
atio
nD
evel
op
men
t
Use
rfe
edb
ack
Met
adat
a
Wo
rklo
ad
Net
wo
rk
Sec
uri
ty
Info
. M
on
.
PM
B
Sto
rag
e
III. Grid Middleware
I. Experiment Layer
II. Application Middleware
IV. Facilities and Fabrics
UserBoard
DeploymentBoard
Management: Mapping Grid Structures
GridPP Status?
GridPP status
(last night)
14 Sites
2,898 CPUs
124 TBytes storage
Recommended