View
219
Download
3
Tags:
Embed Size (px)
Citation preview
Changes
- Increasing processing power - Increasing storage capability and smarter devices - Reducing cost - Popularity of computing&communication devices
- desktop, laptop, PDA, Cell Phone
- Ubiquitous connection- high speed network, wireless network
- Emerging applications- video conference, collaborative scientific computing, digital library
- Mobile user & increasing user demands
- Computing environment Solitary, fixed-location event -> widely distributed, highly interactive, mobile activity
Challenges for data management - large-scale system - effective management - clients, devices, amount of data - diversity – flexibility/adaptive performance - distributed and heterogeneous storage devices - various applications with demands of different support - clients with different abilities (mobile user) - different networks- multiple administrative domains - security - data belongs to different administrative domains - clients belong to different administrative domains - data privacy and anonymity
Efficient, secure and effective location- and context- aware data access from anywhere at anytime
Towards Grid Computing
A novel paradigm that enables the sharing, selection, & aggregation of geographically distributed resources anywhere & anytime:
Computers – PCs, workstations, clusters, supercomputers, laptops, notebooks, mobile devices, PDA, etc.
Software – e.g., ASPs renting expensive special purpose applications on demand.
Catalogued data and databases – e.g. transparent access to human genome database.
Special devices/instruments – e.g., radio telescope – SETI@Home searching for life in galaxy.
People/collaborators.
Depending on their availability, capability, cost, and user QoS requirements for solving large-scale problems/applications.
Thus enabling the creation of “virtual enterprises”
Widearea
Plethora of ChallengesSecurity
Resource Allocation & Scheduling
Data locality
Network Management
System Management
Resource Discovery
Uniform Access
Computational Economy
Application Construction
Technology Needs: Present, Future Distributed Supercomputing:
– Computational science. High-Capacity/Throughput Computing:
– Large scale simulation/chip design & parameter studies. Content Sharing (free or paid):
– Sharing digital contents among peers. Remote software access/renting services:
– Application service provides (ASPs) & Web services. Data-intensive computing:
– Drug Design, Particle Physics, Stock Prediction... On-demand, realtime computing:
– Medical instrumentation & Mission Critical. Collaborative Computing:
– Collaborative design, Data exploration, education. Service Oriented Computing (SOC):
– Towards economic-based Utility Computing: New paradigm, new applications, new industries, and new business.
Datanomic ComputingSystem behavior driven by characteristics of the data• System automatically optimizes itself to complement ever
changing data requirements– Allocate resources according to increase in demand of the data– Transform data formats to support different applications
• Seamless data access from anywhere at anytime– Location and context aware access to data– Consistent view of each user’s data
• Data access independent of platforms, operating systems, and data formats
• Potential platform for cyberinfrastructure– High performance computing, large data stores, better
bandwidth for communication
Objectives• Develop self optimizing global infrastructure• Exploit active objects and intelligent storage devices
– Objects can be uniquely identified– Objects can automatically migrate, replicate or transcode
to satisfy varied user demands
• Store, search, and manage large amount of data efficiently
• Adaptive performance– Objects dynamically adapt to the level of available service – Means to handle intermittent connectivity
Objectives (Cont.)
• Support data intensive applications for – Data mining– Multimedia (MPEG-21)
• Ensure E2E security, strong authentication, anonymity, confidentiality over the IP networks
Datanomic Computing: Hybrid Architecture
IP NetworkWithin a Region
Desktop
Regional Manager
Laptop
App Server
IP NetworkWithin a Region
Desktop
Regional Manager
Laptop
App Server
IP NetworkWithin a Region
Desktop
Regional Manager
Laptop
App Server
IP NetworkWithin a Region
Desktop
Regional Manager
Laptop
App Server
IP Network
Hybrid Architecture: P2P Interaction
• Network is partitioned into an arbitrary number of variable-sized regions
• Region-to-region interaction: P2P– Distributed Indexing with Hashing Architecture
(DIHA) for inter-region communication– Focus on enterprise environment => assume
greater level trust between regions and greater homogeneity in regional interaction
Hybrid Architecture: Regional Organization
Desktop
Regional Manager
App Server
Laptop
IP Network
Intelligent OSD
Hybrid Architecture: Regional Organization
• Partition of regions: based on physical or logical affinity
• Single regional manager
• clients
• Intelligent object-based storage devices
Hybrid Architecture: Regional component (1)
• Regional Manager– Object metadata management– Security related issues within/outside region– Naming service– Object replication, migration and consistency– Clients and OSD devices management
(including mobile clients and devices)
Hybrid Architecture: Regional component (2)
• Client– End users or applications that access objects
within a region– Client has a home region that stores important
client information. The home region is allowed to move
– Client can move freely among region
Hybrid Architecture: Regional component (3)
• Intelligent Object-based Storage Devices– OSD decides if a specific client is allowed to
perform some operations– Perform data-directed operations specified by
the object itself
Hybrid Architecture: Scenario within a region
Desktop
Regional Manager
App Server
Laptop
IP Network
1
2
4
Intelligent OSD
5
3
Hybrid Architecture: Scenario inter-region
IP Network
Desktop
Regional Manager
Laptop
App Server
IP Network
2
Lookup(object ID/name)
4
8
IP Network
Desktop
Regional Manager
Laptop
App Server
1
5
9
3
7
6
IP Network
General Picture
App Server
OSD
IP-N/W
Regional Manager
OSD
IP-N/W
Regional Manager
App Server
OSD
IP-N/W
Regional Manager
App Server
OSD
IP-N/W
Regional Manager
General Picture
IP Network
App Server
OSD
IP-N/W
Regional Manager
OSD
IP-N/W
Regional Manager
App Server
OSD
IP-N/W
Regional Manager
App Server
OSD
IP-N/W
Regional Manager
ComparisonOceanstore StorageTank
Scope Internet/WAN Data center
Architecture Peer to peer Server/client (Client, metadata server, storage device)
Connection/Communication
Varied, wireless, intermittent Normally permanent, high speed
Trusty model untrusted infrastructure(data encrypted) Not-mentioned (trusted?)
Channel security insecure secure
Admin domain Peer to peer One
Object capability Active/archival File data
Data replication and migration
Introspective, bottleneck at parent node Static, app level
Consistency Conflict resolution (a range of consistency semantic) strong
Scalability good Cluster arena
Performance Very high
Separation of Control and data
Support of OSD
Comparison - Similarities
Aspects Datanomic PAST
Separation of control and data
MDS, OBT and OBD
N/A
Access level Object (file)
File
Storage device Intelligent Storage from node
Transport TCP,IB, RDMA
TCP
Comparison- Differences
criteria Datanomic PAST
Scope Global area Global area
Architecture Hybrid, 2 levels
P2P (DHT)
Connection/Communication
Varied, wireless, intermittent
N/A
Trusty model untrusted untrusted
Channel security insecure insecure
Comparison – Differences (cont.)
Aspects Datanomic PAST
Admin domain No No
Object capability
Active with method
File data
Data replication and migration
Automatic, three levels
Automatic
Consistency Strong/weak N/A
Scalability unlimited unlimited
Performance high high
Comparison – Differences (cont.)
Aspects Datanomic PAST
Data privacy and integrity
Support Support
Load balancing Intelligent Based on randomization
Data striping Support Not support
Target Enterprise Global, archival storage
Comparisoncriteria Datanomic NASD Lustre
Scope Global area Data Center Restricted, data center
Architecture
Hybrid, 2 levels
Client-Server Client-server
Connection/
Communication
Varied, wireless,
intermittent
Could be varied
Normally permanent, high
speed
Access Control
TBD Capability based:
Issued per open request
Link privacy
TBD Triple-DESCipher in
counter mode
ComparisonAspects Datanomic NASD Lustre
Revocation TBD Fast since capability issued per
open request
Use existing schemes
Granularity of access control
Method Part of object (specified in
term of bytes)
File
Data replication
and migration
Automatic, three levels
Not addressed Static, app level
Consistency Strong/weak
Not addressed Strong
Scalability unlimited Cluster arena Cluster arena