25
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Challenges of Analysis for Grid Computing Charles Loomis (LAL-Orsay) University College London November 25, 2005

Challenges of Analysis for Grid Computing

  • Upload
    stash

  • View
    18

  • Download
    0

Embed Size (px)

DESCRIPTION

Challenges of Analysis for Grid Computing. Charles Loomis (LAL-Orsay) University College London November 25, 2005. Contents. Introduction What is grid computing? Why is it useful for the LHC? LCG/EGEE production service Middleware services Resources available Current usage - PowerPoint PPT Presentation

Citation preview

Page 1: Challenges of Analysis for Grid Computing

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

Challenges of Analysis for Grid Computing

Charles Loomis (LAL-Orsay)

University College London

November 25, 2005

Page 2: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 2

Enabling Grids for E-sciencE

INFSO-RI-508833

Contents

• Introduction– What is grid computing?– Why is it useful for the LHC?

• LCG/EGEE production service– Middleware services– Resources available– Current usage

• Supporting analysis on the grid– Development needed to meet expectations– Use of grid in other application domains

• Summary

• Opinions are those of the author and may not reflect those of the LCG or EGEE projects!

Page 3: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 3

Enabling Grids for E-sciencE

INFSO-RI-508833

What is the Grid?

• “A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high computational capabilities.”The Grid, I. Foster and C. Kesselman, 1998

• Characteristics:– Critical part of the grid is the “middleware”.– Transparent access to all available resources. – Secure access across administrative boundaries.– Enables sharing of resources.

Page 4: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 4

Enabling Grids for E-sciencE

INFSO-RI-508833

Why the Grid?

• User– Reduced (or no) porting to take advantage of remote resources.– More available resources, less time waiting for answers.

• Experiment– No reinventing the wheel: reuse of high-level grid services.– Means of coordinating global computing resources.

• Institute– More efficient use of hardware.– Reduced outlay for hardware through sharing.

Page 5: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 5

Enabling Grids for E-sciencE

INFSO-RI-508833

What the Grid is Not!

• Unlimited, free resources– Sharing is expected make more resources available at lower

cost, but… sharing is a two-way street.– Users (or their institutes) must still provide resources equivalent

to their average consumption.

• The Borg– Making resources available in the grid is always voluntary. – Administrators can set policies on who can access those

resources, when, with what priority, etc.

• Magic– Cannot divine the needs of your applications.– Provides mechanism for creating generally useful services, but

users must still write application-level code or layer to bind to grid services.

Page 6: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 6

Enabling Grids for E-sciencE

INFSO-RI-508833

LHC and Grid Computing

• The computing needs of the LHC and goals of grid computing are a good match.

• Users and resources are globally distributed.• Scale of storage and computing resources requires

federations of diverse resources.– 43 PB of mass storage, 37 PB of disk storage– 105k SI2000 of computing

• Needs correspond well to base-level grid services.– Batch-like access to computing resources.– Storage of large data sets.– Metadata management for finding data.

Page 7: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 7

Enabling Grids for E-sciencE

INFSO-RI-508833

LCG

• LHC Computing Grid:– Prepare, deploy, and operate the computing environment to allow

the physicists to analyze the data from LHC detectors.

• Requires:– Storage and management of large amounts of data.– Easy access to data and associated metadata.– Access to local and remote computing resources.– Stable, reliable system for long periods of time:

§ Large productions of simulation.§ Chaotic access for data analysis.

• Goals are similar to those of grid computing.

Page 8: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 8

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE

• Enabling Grids for E-sciencE:– Provide and manage an European grid infrastructure to support

researchers from many disciplines.

• LCG and EGEE have similar aims:– LCG: world wide collaboration; one field.

§ Lifetime: ~20 years.

– EGEE: European grid; many fields.§ Lifetime: 2+2 years.

– EGEE-II: Proposed project to maintain infrastructure.§ Lifetime: 2 years.

• Division of Labor:– LCG: Provides and operates infrastructure.– EGEE: Re-engineers grid software.

Page 9: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 9

Enabling Grids for E-sciencE

INFSO-RI-508833

Many Other Projects!

• Interoperability between middleware and infrastructures is a real concern.

Page 10: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis

Enabling Grids for E-sciencE

INFSO-RI-508833

Job Submission

User Interface

ResourceBroker

InformationSystem

ReplicaCatalogs

1. submit

2. query

3. query

4. submit

5. retrieve

6. retrieve

publish status

UserInterface

ResourceBroker

Information

System

ReplicaCatalog

StorageElement

Computing

Element

Site 1

StorageElement

Computing

Element

Site 2

Page 11: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 11

Enabling Grids for E-sciencE

INFSO-RI-508833

Security

• Public Key Infrastructure– Uses Grid Security Infrastructure (GSI) from Globus.

• Authentication (i.e. Who are you?)– Certificate Authorities (CA)

§ More than 30 CAs.§ Covers Europe, North America, and Asia.

– Principals: Hosts, People, Services. – Single sign-on:

§ User generates time-limited proxy.§ Proxy used to delegate authority.

• Authorization (i.e. What can you do?)– Done by Virtual Organization (VO).– Resources query VO membership server for membership list.

Page 12: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 12

Enabling Grids for E-sciencE

INFSO-RI-508833

Information System

• Information System is Backbone of Grid:– Used as a service index.– Transports status information to broker.

• MDS– LDAP-based system provided by Globus.– Augmented by plain-vanilla LDAP for performance (BDII).– Hierarchy of all grid information.

• R-GMA– Consumer/Producer model.– Uses relational DB behind.– Uses same information providers as MDS.

Page 13: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 13

Enabling Grids for E-sciencE

INFSO-RI-508833

Data Management

• Data Management– Storage Services

§ GridFTP (gsiftp) servers being phased out.§ Transition to SRM-based services.

– Transport protocols:§ gsiftp (remote, local access)§ rfio, posix (local access)§ http, https (limited support)

• VO Replica Catalog– Locations of replicated files.– RB uses these catalogs to find viable sites for jobs.

• VO Metadata Catalog– Information about data files on grid.– Accessed directly by end-users.

Page 14: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis

Enabling Grids for E-sciencE

INFSO-RI-508833

LCG/EGEE Production Service

> 200 sites> 20 kCPU> 13 PB

htt

p:/

/go

c03

.grid

-su

pp

ort

.ac.

uk/

go

og

lem

ap

s/lc

g.h

tml

Page 15: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 15

Enabling Grids for E-sciencE

INFSO-RI-508833

• ATLAS data challenge (Rome, June 2005)– 200 CPU-years used– 380k jobs in total– 1.4M data files, 45TB– 10 people running production

• Total success rate: 52%

ATLAS Data Challenge

https://edms.cern.ch/document/641261/18

Page 16: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 16

Enabling Grids for E-sciencE

INFSO-RI-508833

• WISDOM: Wide In Silico Docking on Malaria– 67 CPU-years in 37 days– 73k jobs in total– 947GB of data– 5 people running production

• Total success rate: 47%• W/O license failures: 65%

WISDOM Data Challenge

http://wisdom.eu-egee.fr/

Page 17: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 17

Enabling Grids for E-sciencE

INFSO-RI-508833

Better Reliability

• Success rate ~60% is not adequate.– Painful, but workable for large productions.– Too frustrating for analysis.

• Certification– Avoid landing on a “bad” site, but reduces available resources.– Must make software easier to install and configure.

– Current ad hoc solution for Site Functionality Tests needs to be generalized and integrated with the matchmaking.

– Examples:§ SFT (and other batteries of tests)§ Application software validation§ Site security validation

Page 18: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 18

Enabling Grids for E-sciencE

INFSO-RI-508833

Chaotic Access

• Service challenges and large scale productions stress the grid, but in a very organized manner.

• The large-scale analysis which will appear with real LHC data will be much more chaotic.

• Need to test how services will respond to this:– Batch systems with thousands of different users.– Storage systems caching large numbers of different files.– Metadata catalogs with large numbers of varied requests.– etc.

Page 19: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 19

Enabling Grids for E-sciencE

INFSO-RI-508833

Accessible Grid Software

• Grid clients required for all common platforms:– People are more efficient working in their usual environment.– Normal test progression is efficient; don’t interfere with this.

• Lightweight services for the laptop/workstation:– Changing analysis software or scripts to work in different

environments is error-prone and frustrating.– Allow users to see one environment by running lightweight

services on their laptop.– Ideally these would be visible in the grid, so that the user only

needs to indicate that jobs need more or different resources.

Page 20: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 20

Enabling Grids for E-sciencE

INFSO-RI-508833

Access Control Lists

• Large experiments are always a balance between collaboration and competition.

• Analysis tends to be competitive:– Need to use common resources,– But keep certain things private.

– Fine-grained Access Control Lists (ACLs) will need to be supported by nearly all services. E.g.§ Analysis jobs: who can kill them, reschedule them, …?§ Analysis software: who can read the code?§ Produced data: who can read, delete, list, … the data?

Page 21: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 21

Enabling Grids for E-sciencE

INFSO-RI-508833

Priorities

• The fair amount of excess capacity on the production service, means most jobs are not significantly delayed.

• With large-scale analysis and production in parallel, this will change.

• Priorities will be needed:– For computational, storage, and network resources.– Must seamlessly incorporate policies from:

§ User: e.g. mix of analysis jobs and “service” jobs§ Experiment: e.g. critical realignment jobs before analysis jobs§ Sites: e.g. local users run with higher priority

– Must resolve conflicts between policies.§ E.g. high-priority access to CPU, but low-priority, to storage.

Page 22: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 22

Enabling Grids for E-sciencE

INFSO-RI-508833

Database Issues

• Users will need to store information about their analyses in databases.– Location of produced data files.– Metadata concerning those files.

• Common services:– Privacy and namespace issues must be resolved.

• Private services:– Federation issues must be resolved.

Page 23: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 23

Enabling Grids for E-sciencE

INFSO-RI-508833

Communication

• Effective communication is vital for analysis.

• The grid should incorporate communication tools:– e-mail and mailing lists– chat– phone– video

• And facilitate their use. For example:– “single sign-on” for all services– automatic management of lists with VO authorization groups– management of MCU for video

Page 24: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 24

Enabling Grids for E-sciencE

INFSO-RI-508833

Other Applications

• Biomedical applications– Public database usage– Large resource needs– Privacy concerns– Quasi-realtime response

• Earth sciences– Widely distributed data– “Complex” metadata searches– Commercial software– Quasi-realtime response

• Astrophysics– Sharing data between VOs

• Computational Chemistry– Large, parallel algorithms

Page 25: Challenges of Analysis for Grid Computing

Challenges of Analysis… – Nov. 25, 2005 – C. Loomis 25

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary

• Grid technology fits well with the needs and constraints of the high-energy physics community.

• LCG/EGEE production service– Large number of globally-distributed resources available.– Successfully used by many experiments for large productions.– Will need to grow by 5 times to meet needs of LHC.

• Supporting analysis is challenging for the grid:– Reliability must increase significantly.– Better availability of the software on different platforms.– Finer-grained control over access to and use of resources.– Incorporation of new services into the grid.