Upload
roy-powell
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
The JuxMem-Gfarm Collaboration
Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage
Using the Gfarm Global File System
Gabriel Antoniu, Loïc Cudennec,Majd Ghareeb
INRIA/IRISARennes, France
Osamu Tatebe
University of TsukubaJapan
2
Context: Grid Computing
Target architecture: cluster federations (e.g. Grid’5000)
Focus: large-scale data sharing
Solid mechanics
Thermodynamics
Optics
Dynamics
Satellite design
3
Current Approaches for Data Management on Grids
Use of data catalogs: Globus GridFTP, Replica Location Service, etc
Logistical networking of data: IBP Buffers available across Internet
Unified access to data: SRB From file-systems to tapes and databases
Limitations No transparency => Increased complexity at large scale No consistency guarantees for replicated data
4
Towards Transparent Access to Data
Desirable features Uniform access to distributed data via global identifiers Transparent data localization and transfer Consistency models and protocols for replicated data
Examples of systems taking this approach On clusters
Memory level: DSM systems (Ivy, TreadMarks, etc.) File level: NFS-like systems
On grids Memory level: data sharing services
JuxMem - INRIA Rennes, France File level: global file systems
Gfarm - AIST/University of Tsukuba, Japan
Idea: a Collaborative Research of Memory and File-level Data Sharing
Study possible interactions between The JuxMem grid data sharing service The Gfarm global file system
Goal Enhance global data sharing functionality Improve performance and reliability Build a memory hierarchy for global data sharing by
combining the memory level and the file system level
Approach Enhance JuxMem with Persistent Storage using Gfarm
Support The DISCUSS Sakura collaboration (2006-2007)
5
6
JuxMem: a Grid Data-Sharing Service
Generic grid data-sharing service Grid-scale: 103-104 nodes Transparent data localization Data consistency Fault-tolerance
JuxMem ~= DSM + P2P
Implementation Multiple replication strategies Configurable consistency protocols Based on JXTA 2.0 (
http://www.jxta.org/)
Integrated into 2 grid programming models
GridRPC (DIET, ENS Lyon) Component models (CCM & CCA)
Cluster group A
Juxmem group
Cluster group C
Cluster group B
Data group D
http://juxmem.gforge.inria.fr
7
JuxMem’s Data Group: a Fault-Tolerant, Self-Organizing Group
GDG : Global Data GroupLDG : Local Data Group
LDG
Client
GDG
LDG
LDG
Data group D
Data availability despite failures is ensured through replication and fault-tolerant building blocks
Hierarchical self-organizing groups Cluster level: Local Data Group (LDG) Grid level: Global Data Group (GDG)
Group membership
Atomic multicast
Consensus
Failure detectors
Adaptation layer
Self-organizing group
8
JuxMem: Memory Model and API
Memory model (currently): entry consistency Explicit association of data to locks Multiple Reader Single Writer (MRSW)
juxmem_acquire, acquire_read, release Explicit lock acquire/release before/after access
API Allocate memory for JuxMem data
ptr = juxmem_malloc (size, #clusters, #replicas per cluster, &ID…)
Map existing JuxMem data to local memory ptr = juxmem_mmap (ID), juxmem_unmap (ptr)
Synchronization before/after data access juxmem_acquire(ptr), juxmem_acquire_read(ptr),
juxmem_release(ptr) Read and write data: direct access through pointers!
int n = *ptr; *ptr =…
Gfarm: a Global File System [CCGrid 2002]
Commodity-based distributed file system that federates storage of each site
It can be mounted from all cluster nodes and clients It provides scalable I/O performance wrt the number
of parallel processes and users It supports fault tolerance and avoids access
concentration by automatic replica selection
Gfarm File System
/gfarm
ggf jp
aist gtrc
file1 file3file2 file4
file1 file2
File replica creation
Globalnamespace
mapping
GridFTP, samba, NFS server
Compute & fs node
Compute & fs node
Compute & fs node
Compute & fs node
Compute & fs node
Compute & fs node
Gfarm: a Global File System (2)
Files can be shared among all nodes and clients Physically, it may be replicated and stored on any
file system node Applications can access it regardless of its location File system nodes can be distributed
GridFTP, samba, NFS server
Gfarm metadata server
Compute & fs node
Compute & fs node
Compute & fs node
Compute & fs node
Compute & fs node
ClientPC
NotePC
/gfarm
metadata
Gfarmfile system
…
File A
File A
File B
File C
File A
File B
File C
File C
File B
USJapan
11
Our Goal: Build a Memory Hierarchy for Global Data Sharing
Approach Applications use JuxMem’s API (memory-level sharing) Applications DO NOT use Gfarm directly JuxMem uses Gfarm to enhance data persistence
Without Gfarm, JuxMem supports some crashes of memory providers thanks to the self-organizing groups
With Gfarm, persistence is further enhanced thanks to secondary storage
How does it work? Basic principle: on each lock release, data can be flushed
to Gfarm Flush frequency can be tuned to compromise
efficiency/fault tolerance
12
Step 1: A Single Flush by One Provider
Cluster #1 Cluster #2
JuxMemGlobalData
Group(GDG)
JuxMemProvider
GDG Leader
JuxMemProvider
JuxMemProvider
GFarmGFSDGFSD GFSDGFSD
One particular JuxMem provider (GDG leader) flushes data to Gfarm
Then, other Gfarm copies can be created using Gfarm’s gfrep command
13
Step 2: Parallel Flush by LDG Leaders
Cluster #1 Cluster #2
JuxMemLocalData
Group(LDG #1)
JuxMemProvider
LDG #1Leader
JuxMemProvider
GFarmGFSD
LDG #2Leader
GFSD GFSDGFSD
JuxMemLocalData
Group(LDG #2)
One particular JuxMem provider in each cluster (LDG leader) flushes data to Gfarm (parallel copy creation, one copy per cluster)
The copies are registered as the same Gfarm file Then, other Gfarm copies can be created using Gfarm’s gfrep command
14
Step 3: Parallel Flush by All Providers
Cluster #1 Cluster #2
JuxMemGlobalData
Group(GDG)
JuxMemProvider
GFarmGFSD
JuxMemProvider
GFSD
JuxMemProvider
GFSD
JuxMemProvider
GFSD
All JuxMem providers in each cluster (LDG leader) flush data to Gfarm All copies are registered as the same Gfarm file Useful to create multiple copies of the Gfarm file per cluster No more replication using gfrep
15
Deployment issues Application deployment on large scale infrastructures
Reserve resources Configure the nodes Manage dependencies between processes Start processes Monitor and clean up the nodes
Mixed-deployment of GFarm and JuxMem Manage dependencies between processes of both applications Make the JuxMem provider able to act as a Gfarm client
Approach: use a generic deployment tool: ADAGE (INRIA, Rennes, France)
Design specific plugins for Gfarm and JuxMem
16
ADAGE: Automatic Deployment of Applications in a Grid Environment
IRISA/INRIA Paris Research Group Deploy a same application
on any kind of resources from clusters to grids
Support multi-middleware applications MPI+CORBA+JXTA+GFARM...
Network topology description Latency and bandwidth hierarchy NAT, non-IP networks Firewalls, Asymmetric links
Planner as plugin Round robin & Random
Preliminary support for dynamic applications
Some successes 29,000 JXTA peers on ~400 nodes 4003 components on 974 processors on 7
sites
GFarm Application Description
JuxMem Application Description
Resource Descriptio
n
Generic Application Description
Control Parameter
s
Deployment Planning
Deployment Plan Execution
Application Configuratio
n
17
Roadmap overview
Design of the common architecture (done) Discussions on possible interactions between JuxMem and Gfarm
May 2006, Singapore (CCGRID 2006) June 2006, Paris (HPDC 2006 and NEGST workshop)
October 2006: Gabriel Antoniu and Loïc Cudennec visited the Gfarm team First deployment tests of Gfarm on G5K Overall Gfarm/JuxMem design
December 2006: Osamu Tatebe visited the JuxMem team Refinement of the Gfarm/JuxMem design
Implementation of JuxMem on top of Gfarm (partially done) April 2007: Gabriel Antoniu and Loïc Cudennec visited the Gfarm team One JuxMem provider (GDG leader) flushes data to Gfarm after each critical
section (step 1 done) Work started on large-scale deployment of Gfarm using ADAGE
Future work: parallel flush of JuxMem data to Gfarm (steps 2 and 3) Work in progress (Master thesis of Majd Ghareeb at INRIA Rennes)
18
Roadmap: deployment Design the Gfarm plugin for ADAGE (April 2007 - done!)
Propose a specific application description language for GFarm Translate the specific description into a generic description Start processes with respect of the dependencies Transfer the Gfarm configuration files from:
The Metadata Server to the Agents The Agents to their GFSD and Clients
Deployment of JuxMem on top of Gfarm (May 2007 - first prototype running on G5K!)
A simple configuration (one reader, one writer, one provider = one Gfarm client)
ADAGE deploys Gfarm, then JuxMem (separate deployment) Limitation: the user still needs to indicate
The Gfarm client hostname The Gfarm configuration file location
Future work: design a meta-plugin for ADAGE that automatically deploys a mixed description of a Gfarm+JuxMem configuration