Upload
christiana-nash
View
213
Download
0
Embed Size (px)
Citation preview
The Data Grid: Towards an Architecture for the Distributed Management
and Analysis of Large Scientific Dataset
Caitlin Minteer & Kelly Clynes
The Data Grid
Large dataset size Geographic distribution of users and
resources Computationally intensive analysis No other architecture exists that allows
us to apply technologies in large scale application domains
The Data Grid
Data grid applications must frequently operate in wide area, multi-institutional diverse environments
Design Architecture for The Data Grid
Mechanism Neutrality Designed to be as independent as
possible of low level mechanisms Defining interfaces that sum up oddness
of specific storage systems.
Design Architecture for The Data Grid
Policy Neutrality Structured so that design decisions with
significant performance implications are exposed to the user
Design Architecture for The Data Grid
Compatibility with Grid Infrastructure Take advantage of fundamental Grid
infrastructure Compatible with lower level Grid
mechanisms
Design Architecture for The Data Grid
Uniformity of Information Infrastructure The same data model and interface used
to access the grids metadata
Design Architecture for The Data Grid
These four principals lead us to development of a layered architecture.
Lower layers provide high performance access to a statistical set of devices.
In data grids, the focus on simple, policy-independent mechanisms will encourage and enable wide use without limiting the range of applications that can be applied.
Core Grid Data Services
Two fundamental services required in data grid architecture: Data Access Metadata Access
Data Access
Provides mechanisms for accessing, managing, and initiating third party transfers of data stored in storage systems
Metadata Access
Provides mechanisms for accessing and managing information about data stored in storage systems
Data Abstraction: Storage System
Basic grid component is the Storage System which provides functions for creating, destroying, reading, writing and manipulation file instances
File instances are basic unit of information in a storage system
A Storage system implemented by any storage technology that can support the required access functions
Data Access:
Storage system access functions must be included with the security environment of each site to which remote access is required
Applications should be able to provide storage systems with hints concerning access patterns, network performance, etc, that the storage system can use to optimize performance
Data movement functions must be able to detect and report errors
Metadata
Management of the data grid itself Information about file instances, the
contents of file instances, and the various storage systems contained in the grid
The metadata service provides the way to publish and access the data
Application Metadata
Describes the contents and structure of the data Content represented by the file Circumstances under which the data was
obtained Other info useful to applications that
process the data
Replica Metadata
Used to manage replication of data objects
Includes information for mapping file instances to a particular storage system locations
System Configuration Metadata
Describes the fabric of the grid itself i.e network connectivity and details
about storage systems Capacity Usage policy
Additional Requirements
Service must operate efficiently in a distributed environment
Scalable Robust Assert Local Control over information
Hierarchical Distributed System
Because of these, the metadata service must be hierarchical distributed system Achieve scalability Avoid single points of failure Facilitate local control over data
Higher-Level Data Grid Components
Two types of representative components: Replica management Replica selection
Replica Management
Replica Manager Create copies of file instances, or
replicas, within specified storage systems
Offers better performance or availability for access to or from a particular location
Maintains repository or catalog
Replica Selection and Data Filtering
High level service provided in the data grid is Replica Selection Optimize performance principles
Speed Cost Security
Replicas may be local or accessed remotely
Summary
Architecture of the Data Grid Mechanism Neutrality Policy Neutrality Compatibility with Grid Infrastructure Uniformity of information infrastructure
Data Services Data Access Metadata Access
Replica Management