50
Grid computing e-Science 1 Grid computing and e- Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Embed Size (px)

Citation preview

Page 1: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 1

Grid computing and e-Science

Lecturer: PhD. Phạm Trần Vũ

Presenter: Phan Quang Thiện Trần Phước Hiệp

Nguyễn Minh Nhật

Page 2: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 2

Outline What’s e-science New modes of scientific inquiry Fault diagnosis and prognostic system Grid service for diagnostic problem Distributed Aircraft Maintenance

Environment(DAME) project Conclusion

Page 3: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

“e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.”

John Taylor Director General of Research Councils Office of Science and Technology

Purpose of the UK e-Science initiative is to allow scientists to do ‘faster, better or different’ research

3

Page 4: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

“At the heart of the cyberinfrastructure vision is the development of a cultural community that supports peer-to-peer collaboration and new modes of education based upon broad and open access to leadership computing; data and information resources; online instruments and observatories; and visualization and collaboration services”.

Dr. Arden L. Bement, Jr. , Director of National Science Foundation

Includes not only computers but also data storage resources and specialized facilities

Long term goal is to develop the middleware services that allow scientists to routinely build the infrastructure for their ‘Virtual Organisations’

4

Page 5: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Data-intensive science Simulation-Based Science Remote Access to Experimental Apparatus

5

Page 6: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Worldwide, scientists and engineers are producing, accessing, analyzing, integrating and storing terabytes of digital data daily through experimentation, observation and simulation

These vast amount of data needs to be preprocessed and distributed for further analysis.

6

Page 7: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

7

Annual data storage:

12-14 PetaBytes/year

Concorde(15 Km)

Balloon(30 Km)

CD stack with1 year LHC data!(~ 20 Km)

Mt. Blanc(4.8 Km)

50 CD-ROM

= 35 GB

6 cm

Each of the four LHC experiments will generate several petabytes of experimental data per year

Page 8: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

The Japanese Earth Simulator was in 2003 running numerical simulations of Earth’s climate at a sustained rate of 40 teraflop/sec.

The U.S. Encyclopedia of Life (EOL) project. http://www.eol.org/

The UK Comb-e-Chem project The goal of this project is to “synthesize” large

numbers of new compounds by high-throughput combinatorial methods and then map their structure and properties.

8

Structure + Properties Knowledge + Prediction

Page 9: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

The advance of technology is also producing revolutionary new experimental apparatus.

Allow remote participants to design, execute, and monitor experiments.

9

Page 10: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Sharing engineering research equipment, data resources, and leading edge computing resources.

Remote access to perform teleobservation andteleoperation of experiments.

10

Page 11: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

The convergence of information, grid, and networking technologies with contemporary communications now enables science and engineering communities to pursue their research and learning goals in real-time and without regard to geography.

The size and/or complexity of the problem requires that people in several organizations collaborate and share computing resources, data, instruments

Virtual organization: A set of individuals and/or institutions defined by such sharing

rules In other words, VOs are dynamic federations of

heterogeneous organizational entities sharing data, metadata, processing and security infrastructure

11

Page 12: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

• If you need huge Computing Power and/or Data Storage

• If do not have a supercomputer in your institution

• If you have access to a “reasonable” network connection

Grid (Distributed Computing) could be a good solution

12

Page 13: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

13

HPC

HPCAnalysis

Storage

Storage

Analysis

Experiment

ExperimentComputing

HPC

Scientist

Page 14: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

14

MIDLEWARE

Experiment

Experiment

Computing

Computing

Computing

Storage Storag

e

Storage

Analysis

Analysis

Scientist

Page 15: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

InfrastructureInfrastructure

ScientistsScientists

Page 16: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

16

use Web 2.0 here

Grid

Page 17: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

17

scientists

LocalWeb

Repositories

Digital Librarie

s Graduate Students

Undergraduate Students

Virtual Learning

Environment

Technical

Reports

Reprints

Peer-Reviewed Journal & Conference Papers

Preprints &

Metadata

Certified Experimental

Results & Analyses

experimentation

Data, Metadata

Provenance WorkflowsOntologies

The social process of science

Page 18: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

18

An e-Science Grid Framework

Layer 4:Portal &

Application

Layer 3:Application

Toolkit

Layer 2:Core Grid

Engine

Layer 1:Infrastruct

ure

Glo

bu

sParallel Molecular

Modeling

Short-Area-Network based

PC Cluster

Existing Server/Super Computer

DataManagement

Compute-Intensive Data-Intensive Visualization &

Collaboration

Au

then

ticati

on

&

au

thori

zati

on

Gri

d I

nfo

rmati

on

Serv

ice

Un

iform

R

esou

rce

Access

Bro

keri

ng

Co-s

ch

ed

ulin

g

Secu

rity

serv

ices

Un

iform

Data

Access

• • •

ScientificInformatics

Mathematical & TheoreticalSimulations

SimulationsOf

Materials• • •

Fast-Ethernet based

PC Cluster

Data Storage

Special Instrument

CollaborativeResource

ManagementResource Monitoring

Security

Iterative Solver

Page 19: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Capture individual data transformation and analysis steps

Large monolithic applications broken down to smaller jobs

Smaller jobs can be independent or connected by some control flow/ data flow dependencies

Usually expressed as a Directed Acyclic Graph of tasks

Allows the scientists to modularize their application

Scaled up execution over several computational resources

19

Page 20: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Workflows orchestrate processes on the Grid

Workflows are a processing model that incorporate tasks, data, and rulestasks, data, and rules.

Workflow management systems execute taskstasks on the Grid using datadata once the task’s dependencies are satisfied based on rulesrules.

20

Task 1

Task 2

Task 3

Task 4

Task 5

Page 21: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

21

Cyberinfrastructure: Local machine, cluster, PBS (Condor) pool, Grid

A decision system that develops strategies for reliable and efficient execution in a variety of environments.

Reliable and scalable execution of dependent tasks

Reliable, scalable execution of independent tasks (locally, across the network), priorities, scheduling

Page 22: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Execute Environment

22

Globus and Condor Services for job scheduling

Globus Services for data transfer and Cataloging

Information Services: - information about data

location - information about the

execution sites

Page 23: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

23

1. Everyday researchers doing everyday researchBUT heroic Grid infrastructure not being adopted

2. A data-centric perspective, like researchersBUT Grid gives APIs to computation not data

3. Collaborative and participatoryBUT Grid has deeply rooted service provider mindset

6. Better not PerfectBUT Grid aims to provide well-engineered perfect

solution7. Giving autonomy to researchers

BUT Grid imposes institutional control (at this time)8. About pervasive computing

BUT Grid is about portals, not the next generation of users

The Grid ProblemThe Grid Problem

Page 24: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

e-Science is about doing new science Grid is just one part of the solution Users are not just consumers of

infrastructure. Empower them. Think Web 2.0 on top of Grid and other

services Workflows make e-Science easier, and Web

2 makes workflows easier.

SummarySummary

Page 25: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 25

Diagnosis and prognostic system

Computer-based fault diagnosis and prognostic (DP)

Arise in many domains : medicine, engineering, transport, and aero-space

Page 26: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 26

Operational Scenario

Engine flight data

Airline office

Maintenance Centre

European data center

London Airport

New York Airport

American data center

GridDiagnostics Centre

Page 27: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 27

Diagnosis and prognostic (DP) System

Data-centric Require complex interactions among agents Distributed Need to provide supporting and qualifying

evidence for the DP offered Safety and business critical and high

dependability requirements

Page 28: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 28

Data Centricity

Integrating data from several different system for root cause determination

Require vast data repositories The types of data can also be highly diverse Not only sensor data but also non-declarative

knowledge The interpretation of the knowledge can vary

among the entities

Page 29: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 29

Data Centricity

Grid computing: Knowledge and semantics (chapter 23) Solutions for the management and archiving

of large data repositories Remote collection and distribution of data Coherent integration of information from

diverse databases (chapter 22)

Page 30: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 30

Multiple stakeholders

Involve a number of stakeholders The system owner Experts The commercial service provider ….Grid computing : Interaction of diverse parts is inherent within

the Grid computing model

Page 31: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 31

Distribution

Data storage, data mining, and fault diagnosis may take place at different location

Across diverse IT systems

The system can also be highly dynamic : involving a number of disparate entities (virtual, change often)

Page 32: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 32

Distribution

Grid computing: The standardization of communication and

application protocols in the Grid paradigm

Grid portal : support effective interactions with users

Page 33: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 33

Data Provenance

Transparency and trust results Steps to arrive at a decision

Grid computing : Develop open data communication protocols Meta-labeling schemes

Page 34: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 34

Dependability

Guaranteed service availability Data security System security

Page 35: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 35

Dependability

Grid computing: Offer a security model to secure distributed

computing (chapter 21) Address data access and data confidentiality The concept of guaranteed service and

quality-of-service (chapter 18)

Page 36: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 36

The aero-engine DP problem

Modern aero-engine must operate with extremely high reliability

Combine advanced mechanical engineering systems with electronic control systems

Using engine sensor Prognostic applications

Page 37: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 37

DAME projectEngine flight data

Airline office

Maintenance Centre

European data center

London Airport

New York Airport

American data center

GridDiagnostics Centre

Page 38: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 38

DAME project

Principal challenges : Vast data repositories Advanced pattern-matching and data-mining

methods with suitable response times Collaboration among a number of diverse

actors

Page 39: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 39

DAME service

QUOTE

Data-Mining

DecisionSupport

Case BasedReasoning

NovelData

Ra

wE

ngin

eD

ata

VibrationShaft Speed

Fuel Flow

Ser

vice

Dat

a

Par

tsD

ata

DAME DiagnosticsPortal

Grid Services ManagementModelling/Simulation

Ope

ratio

nal

Dat

a

The Grid

...

AURA-G

Page 40: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 40

Core services and tools

Engine data service Data storage and mining service Engine modeling service Case-based reasoning support Maintenance interface service

Page 41: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 41

Engine data service

Control the interaction between QUOTE system and its communication to ground station

Establish the link to the Grid data repositories. Many replication of this service : highly

transient

Page 42: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 42

Data storage and mining service

Consists of the AURA patter-matching engine system

Use specialized methods to rapidly search both raw and archived engine data

Resemble data-mining service

Page 43: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 43

Engine modeling service

Infer the current state of the engine

Perform model-based data fusion

Page 44: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 44

Case-based reasoning support

Use case-based reasoning to improve the knowledge base

Capture fault DP methods in a procedural way Manage workflows associated with DP

operations Build and maintain the DAME knowledge base

Page 45: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 45

Maintenance interface service

Organize all interaction with stake-holders involved in taking remedial actions

Capture information that helps validate or refine the output from the preceding DP processes

Page 46: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 46

Page 47: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 47

Page 48: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 48

Conclusion Ambitious vision for the future of science and

engineering

The realization of this vision will require long-term investments of financial resources

Should not underestimate the difficulty of the technical challenges before realize the vision

The realization of these goals is extremely important for the future of science and engineering

Page 49: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 49

Q & A

Thank you!

Page 50: Grid computinge-Science1 Grid computing and e-Science Lecturer: PhD. Phạm Trần Vũ Presenter: Phan Quang Thiện Trần Phước Hiệp Nguyễn Minh Nhật

Grid computing e-Science 50

Reference I. Foster and C. Kesselman, The Grid 2:

Blueprint for a New Computing Infrastructure. Morgab Kaufmann Publishers, 1999.

Cyberinfrastructure Vision for 21st Century Discovery (NSF)

National e-Science centre : http://www.nesc.ac.uk/action/esi/

Dame homepage : http://www.cs.york.ac.uk/dame/