27
Changes - Increasing processing power - Increasing storage capability and smarter devices - Reducing cost - Popularity of computing&communication devices - desktop, laptop, PDA, Cell Phone - Ubiquitous connection - high speed network, wireless network - Emerging applications - video conference, collaborative scientific computing, digital library - Mobile user & increasing user demands - Computing environment Solitary, fixed-location event -> widely distributed, highly interactive,

Changes - Increasing processing power - Increasing storage capability and smarter devices - Reducing cost - Popularity of computing&communication devices

  • View
    219

  • Download
    3

Embed Size (px)

Citation preview

Changes

- Increasing processing power - Increasing storage capability and smarter devices - Reducing cost - Popularity of computing&communication devices

- desktop, laptop, PDA, Cell Phone

- Ubiquitous connection- high speed network, wireless network

- Emerging applications- video conference, collaborative scientific computing, digital library

- Mobile user & increasing user demands

- Computing environment Solitary, fixed-location event -> widely distributed, highly interactive, mobile activity

Challenges for data management - large-scale system - effective management - clients, devices, amount of data - diversity – flexibility/adaptive performance - distributed and heterogeneous storage devices - various applications with demands of different support - clients with different abilities (mobile user) - different networks- multiple administrative domains - security - data belongs to different administrative domains - clients belong to different administrative domains - data privacy and anonymity

Efficient, secure and effective location- and context- aware data access from anywhere at anytime

Towards Grid Computing

A novel paradigm that enables the sharing, selection, & aggregation of geographically distributed resources anywhere & anytime:

Computers – PCs, workstations, clusters, supercomputers, laptops, notebooks, mobile devices, PDA, etc.

Software – e.g., ASPs renting expensive special purpose applications on demand.

Catalogued data and databases – e.g. transparent access to human genome database.

Special devices/instruments – e.g., radio telescope – SETI@Home searching for life in galaxy.

People/collaborators.

Depending on their availability, capability, cost, and user QoS requirements for solving large-scale problems/applications.

Thus enabling the creation of “virtual enterprises”

Widearea

R3

Towards Grid Computing

R1

R2

RN

Unification of geographically distributed resources

R4

Plethora of ChallengesSecurity

Resource Allocation & Scheduling

Data locality

Network Management

System Management

Resource Discovery

Uniform Access

Computational Economy

Application Construction

Technology Needs: Present, Future Distributed Supercomputing:

– Computational science. High-Capacity/Throughput Computing:

– Large scale simulation/chip design & parameter studies. Content Sharing (free or paid):

– Sharing digital contents among peers. Remote software access/renting services:

– Application service provides (ASPs) & Web services. Data-intensive computing:

– Drug Design, Particle Physics, Stock Prediction... On-demand, realtime computing:

– Medical instrumentation & Mission Critical. Collaborative Computing:

– Collaborative design, Data exploration, education. Service Oriented Computing (SOC):

– Towards economic-based Utility Computing: New paradigm, new applications, new industries, and new business.

Datanomic ComputingSystem behavior driven by characteristics of the data• System automatically optimizes itself to complement ever

changing data requirements– Allocate resources according to increase in demand of the data– Transform data formats to support different applications

• Seamless data access from anywhere at anytime– Location and context aware access to data– Consistent view of each user’s data

• Data access independent of platforms, operating systems, and data formats

• Potential platform for cyberinfrastructure– High performance computing, large data stores, better

bandwidth for communication

Objectives• Develop self optimizing global infrastructure• Exploit active objects and intelligent storage devices

– Objects can be uniquely identified– Objects can automatically migrate, replicate or transcode

to satisfy varied user demands

• Store, search, and manage large amount of data efficiently

• Adaptive performance– Objects dynamically adapt to the level of available service – Means to handle intermittent connectivity

Objectives (Cont.)

• Support data intensive applications for – Data mining– Multimedia (MPEG-21)

• Ensure E2E security, strong authentication, anonymity, confidentiality over the IP networks

Datanomic Computing: Hybrid Architecture

IP NetworkWithin a Region

Desktop

Regional Manager

Laptop

App Server

IP NetworkWithin a Region

Desktop

Regional Manager

Laptop

App Server

IP NetworkWithin a Region

Desktop

Regional Manager

Laptop

App Server

IP NetworkWithin a Region

Desktop

Regional Manager

Laptop

App Server

IP Network

Hybrid Architecture: P2P Interaction

• Network is partitioned into an arbitrary number of variable-sized regions

• Region-to-region interaction: P2P– Distributed Indexing with Hashing Architecture

(DIHA) for inter-region communication– Focus on enterprise environment => assume

greater level trust between regions and greater homogeneity in regional interaction

Hybrid Architecture: Regional Organization

Desktop

Regional Manager

App Server

Laptop

IP Network

Intelligent OSD

Hybrid Architecture: Regional Organization

• Partition of regions: based on physical or logical affinity

• Single regional manager

• clients

• Intelligent object-based storage devices

Hybrid Architecture: Regional component (1)

• Regional Manager– Object metadata management– Security related issues within/outside region– Naming service– Object replication, migration and consistency– Clients and OSD devices management

(including mobile clients and devices)

Hybrid Architecture: Regional component (2)

• Client– End users or applications that access objects

within a region– Client has a home region that stores important

client information. The home region is allowed to move

– Client can move freely among region

Hybrid Architecture: Regional component (3)

• Intelligent Object-based Storage Devices– OSD decides if a specific client is allowed to

perform some operations– Perform data-directed operations specified by

the object itself

Hybrid Architecture: Scenario within a region

Desktop

Regional Manager

App Server

Laptop

IP Network

1

2

4

Intelligent OSD

5

3

Hybrid Architecture: Scenario inter-region

IP Network

Desktop

Regional Manager

Laptop

App Server

IP Network

2

Lookup(object ID/name)

4

8

IP Network

Desktop

Regional Manager

Laptop

App Server

1

5

9

3

7

6

IP Network

General Picture

App Server

OSD

IP-N/W

Regional Manager

OSD

IP-N/W

Regional Manager

App Server

OSD

IP-N/W

Regional Manager

App Server

OSD

IP-N/W

Regional Manager

General Picture

IP Network

App Server

OSD

IP-N/W

Regional Manager

OSD

IP-N/W

Regional Manager

App Server

OSD

IP-N/W

Regional Manager

App Server

OSD

IP-N/W

Regional Manager

ComparisonOceanstore StorageTank

Scope Internet/WAN Data center

Architecture Peer to peer Server/client (Client, metadata server, storage device)

Connection/Communication

Varied, wireless, intermittent Normally permanent, high speed

Trusty model untrusted infrastructure(data encrypted) Not-mentioned (trusted?)

Channel security insecure secure

Admin domain Peer to peer One

Object capability Active/archival File data

Data replication and migration

Introspective, bottleneck at parent node Static, app level

Consistency Conflict resolution (a range of consistency semantic) strong

Scalability good Cluster arena

Performance Very high

Separation of Control and data

Support of OSD

Comparison - Similarities

Aspects Datanomic PAST

Separation of control and data

MDS, OBT and OBD

N/A

Access level Object (file)

File

Storage device Intelligent Storage from node

Transport TCP,IB, RDMA

TCP

Comparison- Differences

criteria Datanomic PAST

Scope Global area Global area

Architecture Hybrid, 2 levels

P2P (DHT)

Connection/Communication

Varied, wireless, intermittent

N/A

Trusty model untrusted untrusted

Channel security insecure insecure

Comparison – Differences (cont.)

Aspects Datanomic PAST

Admin domain No No

Object capability

Active with method

File data

Data replication and migration

Automatic, three levels

Automatic

Consistency Strong/weak N/A

Scalability unlimited unlimited

Performance high high

Comparison – Differences (cont.)

Aspects Datanomic PAST

Data privacy and integrity

Support Support

Load balancing Intelligent Based on randomization

Data striping Support Not support

Target Enterprise Global, archival storage

Comparisoncriteria Datanomic NASD Lustre

Scope Global area Data Center Restricted, data center

Architecture

Hybrid, 2 levels

Client-Server Client-server

Connection/

Communication

Varied, wireless,

intermittent

Could be varied

Normally permanent, high

speed

Access Control

TBD Capability based:

Issued per open request

Link privacy

TBD Triple-DESCipher in

counter mode

ComparisonAspects Datanomic NASD Lustre

Revocation TBD Fast since capability issued per

open request

Use existing schemes

Granularity of access control

Method Part of object (specified in

term of bytes)

File

Data replication

and migration

Automatic, three levels

Not addressed Static, app level

Consistency Strong/weak

Not addressed Strong

Scalability unlimited Cluster arena Cluster arena