Upload
godfrey-washington
View
226
Download
0
Tags:
Embed Size (px)
Citation preview
Grid computing e-Science 1
Grid computing and e-Science
Lecturer: PhD. Phạm Trần Vũ
Presenter: Phan Quang Thiện Trần Phước Hiệp
Nguyễn Minh Nhật
Grid computing e-Science 2
Outline What’s e-science New modes of scientific inquiry Fault diagnosis and prognostic system Grid service for diagnostic problem Distributed Aircraft Maintenance
Environment(DAME) project Conclusion
“e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.”
John Taylor Director General of Research Councils Office of Science and Technology
Purpose of the UK e-Science initiative is to allow scientists to do ‘faster, better or different’ research
3
“At the heart of the cyberinfrastructure vision is the development of a cultural community that supports peer-to-peer collaboration and new modes of education based upon broad and open access to leadership computing; data and information resources; online instruments and observatories; and visualization and collaboration services”.
Dr. Arden L. Bement, Jr. , Director of National Science Foundation
Includes not only computers but also data storage resources and specialized facilities
Long term goal is to develop the middleware services that allow scientists to routinely build the infrastructure for their ‘Virtual Organisations’
4
Data-intensive science Simulation-Based Science Remote Access to Experimental Apparatus
5
Worldwide, scientists and engineers are producing, accessing, analyzing, integrating and storing terabytes of digital data daily through experimentation, observation and simulation
These vast amount of data needs to be preprocessed and distributed for further analysis.
6
7
Annual data storage:
12-14 PetaBytes/year
Concorde(15 Km)
Balloon(30 Km)
CD stack with1 year LHC data!(~ 20 Km)
Mt. Blanc(4.8 Km)
50 CD-ROM
= 35 GB
6 cm
Each of the four LHC experiments will generate several petabytes of experimental data per year
The Japanese Earth Simulator was in 2003 running numerical simulations of Earth’s climate at a sustained rate of 40 teraflop/sec.
The U.S. Encyclopedia of Life (EOL) project. http://www.eol.org/
The UK Comb-e-Chem project The goal of this project is to “synthesize” large
numbers of new compounds by high-throughput combinatorial methods and then map their structure and properties.
8
Structure + Properties Knowledge + Prediction
The advance of technology is also producing revolutionary new experimental apparatus.
Allow remote participants to design, execute, and monitor experiments.
9
Sharing engineering research equipment, data resources, and leading edge computing resources.
Remote access to perform teleobservation andteleoperation of experiments.
10
The convergence of information, grid, and networking technologies with contemporary communications now enables science and engineering communities to pursue their research and learning goals in real-time and without regard to geography.
The size and/or complexity of the problem requires that people in several organizations collaborate and share computing resources, data, instruments
Virtual organization: A set of individuals and/or institutions defined by such sharing
rules In other words, VOs are dynamic federations of
heterogeneous organizational entities sharing data, metadata, processing and security infrastructure
11
• If you need huge Computing Power and/or Data Storage
• If do not have a supercomputer in your institution
• If you have access to a “reasonable” network connection
Grid (Distributed Computing) could be a good solution
12
13
HPC
HPCAnalysis
Storage
Storage
Analysis
Experiment
ExperimentComputing
HPC
Scientist
14
MIDLEWARE
Experiment
Experiment
Computing
Computing
Computing
Storage Storag
e
Storage
Analysis
Analysis
Scientist
InfrastructureInfrastructure
ScientistsScientists
16
use Web 2.0 here
Grid
17
scientists
LocalWeb
Repositories
Digital Librarie
s Graduate Students
Undergraduate Students
Virtual Learning
Environment
Technical
Reports
Reprints
Peer-Reviewed Journal & Conference Papers
Preprints &
Metadata
Certified Experimental
Results & Analyses
experimentation
Data, Metadata
Provenance WorkflowsOntologies
The social process of science
18
An e-Science Grid Framework
Layer 4:Portal &
Application
Layer 3:Application
Toolkit
Layer 2:Core Grid
Engine
Layer 1:Infrastruct
ure
Glo
bu
sParallel Molecular
Modeling
Short-Area-Network based
PC Cluster
Existing Server/Super Computer
DataManagement
Compute-Intensive Data-Intensive Visualization &
Collaboration
Au
then
ticati
on
&
au
thori
zati
on
Gri
d I
nfo
rmati
on
Serv
ice
Un
iform
R
esou
rce
Access
Bro
keri
ng
Co-s
ch
ed
ulin
g
Secu
rity
serv
ices
Un
iform
Data
Access
• • •
ScientificInformatics
Mathematical & TheoreticalSimulations
SimulationsOf
Materials• • •
Fast-Ethernet based
PC Cluster
Data Storage
Special Instrument
CollaborativeResource
ManagementResource Monitoring
Security
Iterative Solver
Capture individual data transformation and analysis steps
Large monolithic applications broken down to smaller jobs
Smaller jobs can be independent or connected by some control flow/ data flow dependencies
Usually expressed as a Directed Acyclic Graph of tasks
Allows the scientists to modularize their application
Scaled up execution over several computational resources
19
Workflows orchestrate processes on the Grid
Workflows are a processing model that incorporate tasks, data, and rulestasks, data, and rules.
Workflow management systems execute taskstasks on the Grid using datadata once the task’s dependencies are satisfied based on rulesrules.
20
Task 1
Task 2
Task 3
Task 4
Task 5
21
Cyberinfrastructure: Local machine, cluster, PBS (Condor) pool, Grid
A decision system that develops strategies for reliable and efficient execution in a variety of environments.
Reliable and scalable execution of dependent tasks
Reliable, scalable execution of independent tasks (locally, across the network), priorities, scheduling
Execute Environment
22
Globus and Condor Services for job scheduling
Globus Services for data transfer and Cataloging
Information Services: - information about data
location - information about the
execution sites
23
1. Everyday researchers doing everyday researchBUT heroic Grid infrastructure not being adopted
2. A data-centric perspective, like researchersBUT Grid gives APIs to computation not data
3. Collaborative and participatoryBUT Grid has deeply rooted service provider mindset
6. Better not PerfectBUT Grid aims to provide well-engineered perfect
solution7. Giving autonomy to researchers
BUT Grid imposes institutional control (at this time)8. About pervasive computing
BUT Grid is about portals, not the next generation of users
The Grid ProblemThe Grid Problem
e-Science is about doing new science Grid is just one part of the solution Users are not just consumers of
infrastructure. Empower them. Think Web 2.0 on top of Grid and other
services Workflows make e-Science easier, and Web
2 makes workflows easier.
SummarySummary
Grid computing e-Science 25
Diagnosis and prognostic system
Computer-based fault diagnosis and prognostic (DP)
Arise in many domains : medicine, engineering, transport, and aero-space
Grid computing e-Science 26
Operational Scenario
Engine flight data
Airline office
Maintenance Centre
European data center
London Airport
New York Airport
American data center
GridDiagnostics Centre
Grid computing e-Science 27
Diagnosis and prognostic (DP) System
Data-centric Require complex interactions among agents Distributed Need to provide supporting and qualifying
evidence for the DP offered Safety and business critical and high
dependability requirements
Grid computing e-Science 28
Data Centricity
Integrating data from several different system for root cause determination
Require vast data repositories The types of data can also be highly diverse Not only sensor data but also non-declarative
knowledge The interpretation of the knowledge can vary
among the entities
Grid computing e-Science 29
Data Centricity
Grid computing: Knowledge and semantics (chapter 23) Solutions for the management and archiving
of large data repositories Remote collection and distribution of data Coherent integration of information from
diverse databases (chapter 22)
Grid computing e-Science 30
Multiple stakeholders
Involve a number of stakeholders The system owner Experts The commercial service provider ….Grid computing : Interaction of diverse parts is inherent within
the Grid computing model
Grid computing e-Science 31
Distribution
Data storage, data mining, and fault diagnosis may take place at different location
Across diverse IT systems
The system can also be highly dynamic : involving a number of disparate entities (virtual, change often)
Grid computing e-Science 32
Distribution
Grid computing: The standardization of communication and
application protocols in the Grid paradigm
Grid portal : support effective interactions with users
Grid computing e-Science 33
Data Provenance
Transparency and trust results Steps to arrive at a decision
Grid computing : Develop open data communication protocols Meta-labeling schemes
Grid computing e-Science 34
Dependability
Guaranteed service availability Data security System security
Grid computing e-Science 35
Dependability
Grid computing: Offer a security model to secure distributed
computing (chapter 21) Address data access and data confidentiality The concept of guaranteed service and
quality-of-service (chapter 18)
Grid computing e-Science 36
The aero-engine DP problem
Modern aero-engine must operate with extremely high reliability
Combine advanced mechanical engineering systems with electronic control systems
Using engine sensor Prognostic applications
Grid computing e-Science 37
DAME projectEngine flight data
Airline office
Maintenance Centre
European data center
London Airport
New York Airport
American data center
GridDiagnostics Centre
Grid computing e-Science 38
DAME project
Principal challenges : Vast data repositories Advanced pattern-matching and data-mining
methods with suitable response times Collaboration among a number of diverse
actors
Grid computing e-Science 39
DAME service
QUOTE
Data-Mining
DecisionSupport
Case BasedReasoning
NovelData
Ra
wE
ngin
eD
ata
VibrationShaft Speed
Fuel Flow
Ser
vice
Dat
a
Par
tsD
ata
DAME DiagnosticsPortal
Grid Services ManagementModelling/Simulation
Ope
ratio
nal
Dat
a
The Grid
...
AURA-G
Grid computing e-Science 40
Core services and tools
Engine data service Data storage and mining service Engine modeling service Case-based reasoning support Maintenance interface service
Grid computing e-Science 41
Engine data service
Control the interaction between QUOTE system and its communication to ground station
Establish the link to the Grid data repositories. Many replication of this service : highly
transient
Grid computing e-Science 42
Data storage and mining service
Consists of the AURA patter-matching engine system
Use specialized methods to rapidly search both raw and archived engine data
Resemble data-mining service
Grid computing e-Science 43
Engine modeling service
Infer the current state of the engine
Perform model-based data fusion
Grid computing e-Science 44
Case-based reasoning support
Use case-based reasoning to improve the knowledge base
Capture fault DP methods in a procedural way Manage workflows associated with DP
operations Build and maintain the DAME knowledge base
Grid computing e-Science 45
Maintenance interface service
Organize all interaction with stake-holders involved in taking remedial actions
Capture information that helps validate or refine the output from the preceding DP processes
Grid computing e-Science 46
Grid computing e-Science 47
Grid computing e-Science 48
Conclusion Ambitious vision for the future of science and
engineering
The realization of this vision will require long-term investments of financial resources
Should not underestimate the difficulty of the technical challenges before realize the vision
The realization of these goals is extremely important for the future of science and engineering
Grid computing e-Science 49
Q & A
Thank you!
Grid computing e-Science 50
Reference I. Foster and C. Kesselman, The Grid 2:
Blueprint for a New Computing Infrastructure. Morgab Kaufmann Publishers, 1999.
Cyberinfrastructure Vision for 21st Century Discovery (NSF)
National e-Science centre : http://www.nesc.ac.uk/action/esi/
Dame homepage : http://www.cs.york.ac.uk/dame/