Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter:...

Preview:

Citation preview

Grid computing e-Science 1

Grid computing and e-Science

Lecturer: PhD. Phạm Trần Vũ

Presenter: Phan Quang Thiện Trần Phước Hiệp

Nguyễn Minh Nhật

Grid computing e-Science 2

Outline What’s e-science New modes of scientific inquiry Fault diagnosis and prognostic system Grid service for diagnostic problem Distributed Aircraft Maintenance

Environment(DAME) project Conclusion

“e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.”

John Taylor Director General of Research Councils Office of Science and Technology

Purpose of the UK e-Science initiative is to allow scientists to do ‘faster, better or different’ research

3

“At the heart of the cyberinfrastructure vision is the development of a cultural community that supports peer-to-peer collaboration and new modes of education based upon broad and open access to leadership computing; data and information resources; online instruments and observatories; and visualization and collaboration services”.

Dr. Arden L. Bement, Jr. , Director of National Science Foundation

Includes not only computers but also data storage resources and specialized facilities

Long term goal is to develop the middleware services that allow scientists to routinely build the infrastructure for their ‘Virtual Organisations’

4

Data-intensive science Simulation-Based Science Remote Access to Experimental Apparatus

5

Worldwide, scientists and engineers are producing, accessing, analyzing, integrating and storing terabytes of digital data daily through experimentation, observation and simulation

These vast amount of data needs to be preprocessed and distributed for further analysis.

6

7

Annual data storage:

12-14 PetaBytes/year

Concorde(15 Km)

Balloon(30 Km)

CD stack with1 year LHC data!(~ 20 Km)

Mt. Blanc(4.8 Km)

50 CD-ROM

= 35 GB

6 cm

Each of the four LHC experiments will generate several petabytes of experimental data per year

The Japanese Earth Simulator was in 2003 running numerical simulations of Earth’s climate at a sustained rate of 40 teraflop/sec.

The U.S. Encyclopedia of Life (EOL) project. http://www.eol.org/

The UK Comb-e-Chem project The goal of this project is to “synthesize” large

numbers of new compounds by high-throughput combinatorial methods and then map their structure and properties.

8

Structure + Properties Knowledge + Prediction

The advance of technology is also producing revolutionary new experimental apparatus.

Allow remote participants to design, execute, and monitor experiments.

9

Sharing engineering research equipment, data resources, and leading edge computing resources.

Remote access to perform teleobservation andteleoperation of experiments.

10

The convergence of information, grid, and networking technologies with contemporary communications now enables science and engineering communities to pursue their research and learning goals in real-time and without regard to geography.

The size and/or complexity of the problem requires that people in several organizations collaborate and share computing resources, data, instruments

Virtual organization: A set of individuals and/or institutions defined by such sharing

rules In other words, VOs are dynamic federations of

heterogeneous organizational entities sharing data, metadata, processing and security infrastructure

11

• If you need huge Computing Power and/or Data Storage

• If do not have a supercomputer in your institution

• If you have access to a “reasonable” network connection

Grid (Distributed Computing) could be a good solution

12

13

HPC

HPCAnalysis

Storage

Storage

Analysis

Experiment

ExperimentComputing

HPC

Scientist

14

MIDLEWARE

Experiment

Experiment

Computing

Computing

Computing

Storage Storag

e

Storage

Analysis

Analysis

Scientist

InfrastructureInfrastructure

ScientistsScientists

16

use Web 2.0 here

Grid

17

scientists

LocalWeb

Repositories

Digital Librarie

s Graduate Students

Undergraduate Students

Virtual Learning

Environment

Technical

Reports

Reprints

Peer-Reviewed Journal & Conference Papers

Preprints &

Metadata

Certified Experimental

Results & Analyses

experimentation

Data, Metadata

Provenance WorkflowsOntologies

The social process of science

18

An e-Science Grid Framework

Layer 4:Portal &

Application

Layer 3:Application

Toolkit

Layer 2:Core Grid

Engine

Layer 1:Infrastruct

ure

Glo

bu

sParallel Molecular

Modeling

Short-Area-Network based

PC Cluster

Existing Server/Super Computer

DataManagement

Compute-Intensive Data-Intensive Visualization &

Collaboration

Au

then

ticati

on

&

au

thori

zati

on

Gri

d I

nfo

rmati

on

Serv

ice

Un

iform

R

esou

rce

Access

Bro

keri

ng

Co-s

ch

ed

ulin

g

Secu

rity

serv

ices

Un

iform

Data

Access

• • •

ScientificInformatics

Mathematical & TheoreticalSimulations

SimulationsOf

Materials• • •

Fast-Ethernet based

PC Cluster

Data Storage

Special Instrument

CollaborativeResource

ManagementResource Monitoring

Security

Iterative Solver

Capture individual data transformation and analysis steps

Large monolithic applications broken down to smaller jobs

Smaller jobs can be independent or connected by some control flow/ data flow dependencies

Usually expressed as a Directed Acyclic Graph of tasks

Allows the scientists to modularize their application

Scaled up execution over several computational resources

19

Workflows orchestrate processes on the Grid

Workflows are a processing model that incorporate tasks, data, and rulestasks, data, and rules.

Workflow management systems execute taskstasks on the Grid using datadata once the task’s dependencies are satisfied based on rulesrules.

20

Task 1

Task 2

Task 3

Task 4

Task 5

21

Cyberinfrastructure: Local machine, cluster, PBS (Condor) pool, Grid

A decision system that develops strategies for reliable and efficient execution in a variety of environments.

Reliable and scalable execution of dependent tasks

Reliable, scalable execution of independent tasks (locally, across the network), priorities, scheduling

Execute Environment

22

Globus and Condor Services for job scheduling

Globus Services for data transfer and Cataloging

Information Services: - information about data

location - information about the

execution sites

23

1. Everyday researchers doing everyday researchBUT heroic Grid infrastructure not being adopted

2. A data-centric perspective, like researchersBUT Grid gives APIs to computation not data

3. Collaborative and participatoryBUT Grid has deeply rooted service provider mindset

6. Better not PerfectBUT Grid aims to provide well-engineered perfect

solution7. Giving autonomy to researchers

BUT Grid imposes institutional control (at this time)8. About pervasive computing

BUT Grid is about portals, not the next generation of users

The Grid ProblemThe Grid Problem

e-Science is about doing new science Grid is just one part of the solution Users are not just consumers of

infrastructure. Empower them. Think Web 2.0 on top of Grid and other

services Workflows make e-Science easier, and Web

2 makes workflows easier.

SummarySummary

Grid computing e-Science 25

Diagnosis and prognostic system

Computer-based fault diagnosis and prognostic (DP)

Arise in many domains : medicine, engineering, transport, and aero-space

Grid computing e-Science 26

Operational Scenario

Engine flight data

Airline office

Maintenance Centre

European data center

London Airport

New York Airport

American data center

GridDiagnostics Centre

Grid computing e-Science 27

Diagnosis and prognostic (DP) System

Data-centric Require complex interactions among agents Distributed Need to provide supporting and qualifying

evidence for the DP offered Safety and business critical and high

dependability requirements

Grid computing e-Science 28

Data Centricity

Integrating data from several different system for root cause determination

Require vast data repositories The types of data can also be highly diverse Not only sensor data but also non-declarative

knowledge The interpretation of the knowledge can vary

among the entities

Grid computing e-Science 29

Data Centricity

Grid computing: Knowledge and semantics (chapter 23) Solutions for the management and archiving

of large data repositories Remote collection and distribution of data Coherent integration of information from

diverse databases (chapter 22)

Grid computing e-Science 30

Multiple stakeholders

Involve a number of stakeholders The system owner Experts The commercial service provider ….Grid computing : Interaction of diverse parts is inherent within

the Grid computing model

Grid computing e-Science 31

Distribution

Data storage, data mining, and fault diagnosis may take place at different location

Across diverse IT systems

The system can also be highly dynamic : involving a number of disparate entities (virtual, change often)

Grid computing e-Science 32

Distribution

Grid computing: The standardization of communication and

application protocols in the Grid paradigm

Grid portal : support effective interactions with users

Grid computing e-Science 33

Data Provenance

Transparency and trust results Steps to arrive at a decision

Grid computing : Develop open data communication protocols Meta-labeling schemes

Grid computing e-Science 34

Dependability

Guaranteed service availability Data security System security

Grid computing e-Science 35

Dependability

Grid computing: Offer a security model to secure distributed

computing (chapter 21) Address data access and data confidentiality The concept of guaranteed service and

quality-of-service (chapter 18)

Grid computing e-Science 36

The aero-engine DP problem

Modern aero-engine must operate with extremely high reliability

Combine advanced mechanical engineering systems with electronic control systems

Using engine sensor Prognostic applications

Grid computing e-Science 37

DAME projectEngine flight data

Airline office

Maintenance Centre

European data center

London Airport

New York Airport

American data center

GridDiagnostics Centre

Grid computing e-Science 38

DAME project

Principal challenges : Vast data repositories Advanced pattern-matching and data-mining

methods with suitable response times Collaboration among a number of diverse

actors

Grid computing e-Science 39

DAME service

QUOTE

Data-Mining

DecisionSupport

Case BasedReasoning

NovelData

Ra

wE

ngin

eD

ata

VibrationShaft Speed

Fuel Flow

Ser

vice

Dat

a

Par

tsD

ata

DAME DiagnosticsPortal

Grid Services ManagementModelling/Simulation

Ope

ratio

nal

Dat

a

The Grid

...

AURA-G

Grid computing e-Science 40

Core services and tools

Engine data service Data storage and mining service Engine modeling service Case-based reasoning support Maintenance interface service

Grid computing e-Science 41

Engine data service

Control the interaction between QUOTE system and its communication to ground station

Establish the link to the Grid data repositories. Many replication of this service : highly

transient

Grid computing e-Science 42

Data storage and mining service

Consists of the AURA patter-matching engine system

Use specialized methods to rapidly search both raw and archived engine data

Resemble data-mining service

Grid computing e-Science 43

Engine modeling service

Infer the current state of the engine

Perform model-based data fusion

Grid computing e-Science 44

Case-based reasoning support

Use case-based reasoning to improve the knowledge base

Capture fault DP methods in a procedural way Manage workflows associated with DP

operations Build and maintain the DAME knowledge base

Grid computing e-Science 45

Maintenance interface service

Organize all interaction with stake-holders involved in taking remedial actions

Capture information that helps validate or refine the output from the preceding DP processes

Grid computing e-Science 46

Grid computing e-Science 47

Grid computing e-Science 48

Conclusion Ambitious vision for the future of science and

engineering

The realization of this vision will require long-term investments of financial resources

Should not underestimate the difficulty of the technical challenges before realize the vision

The realization of these goals is extremely important for the future of science and engineering

Grid computing e-Science 49

Q & A

Thank you!

Grid computing e-Science 50

Reference I. Foster and C. Kesselman, The Grid 2:

Blueprint for a New Computing Infrastructure. Morgab Kaufmann Publishers, 1999.

Cyberinfrastructure Vision for 21st Century Discovery (NSF)

National e-Science centre : http://www.nesc.ac.uk/action/esi/

Dame homepage : http://www.cs.york.ac.uk/dame/

Recommended