Data Management Over a Two Layered P2P Architecturedos.iitm.ac.in/Vishwa/dataManagementOverP2Psystems.pdf · Vishwa. 2/6/2006 Distributed & Object Systems Lab 7 Motivating Application:

Data Management Over a Two Layered

P2P Architecture

A.Vijay Srinivas

PhD Research Scholar

Department of Computer Science & Engineering,

Indian Institute of Technology Madras

Guide:

Prof. D. Janakiram

2/6/2006 Distributed & Object Systems Lab 2

Two-layered P2P Architecture: A Generic Platform for

Large Scale Resource Sharing

� Large scale resource sharing

� Find resource (data/computing)

� Neighbourhood – proximity

� Capabilities – memory, processing power and storage

� Dynamics – node and network load variations

� Share disparate resources

� Scalability and fault-tolerance

� Case studies

� Vishwa – compute grid middleware

� Process migration for performance/adaptability

� Virat – data grid framework


Requirements – Data Grids

� Functional

� Data management

� Data formats & structures

� Replica mgmt.

� Data access

� Data transfer

� Data querying/searching

� Meta-data management

� Description, automatic generation, interoperability of data formats

� Repository

• Dependable access to meta-data


Requirements – Data Grids

� Functional

� Data processing

� Allow discovery of data/computing resources

� Resource mgmt & scheduling

� Non-functional

� Security

� Scalability

� No. of users or data entities in the system

� Robustness – Fault-tolerance


Scalability – key non-functional requirement

� Scalability

� No centralized components

� No global knowledge

� Resources/Data discovery

� Proximity and capability

� Failures

� Dependability

• Applications – seamlessly adapt to node/network failures

� Reconfigurability

• Middleware components – must be adaptive

� Data replication

� Catalogue/meta-data

� Consistency


Summary: Requirements of platform

for data grids

� Data management

� Data, meta-data formats, description

� Replica mgmt

� Data querying/searching

� Discovery of capable data/computing resources

� High performance data transfer protocol

� Resource management & scheduling

Virat Virat – limited keyword

searching achievable

Virat: Unstructured

P2P layer

gridFTP

Vishwa


Motivating Application: Tele-medicine

Heart patient in village1

Mobile doctor in city

ECG Report

Healthcare

centre in

city

Healthcare

centre in

city

Doctor in

healthcare

centre

Patient recordsPatient

records1

2

2

2

23

Doctor

feedback

3

Doctor

feedback


Shared object spaces for

wide area computing

� Shared object spaces –realizations don’t scale up

� P2P systems – can’t be used directly

� Solutions

� Super-peer Virat – Virat over Pastry• Partially P2P Virat

– Naïve Virat shared object event and service space

– Integration with Pastry

� Fully P2P Virat – Virat over Vishwa

� Performance Studies

� Comparison with T Spaces – a standard tuple space implementation from IBM


Shared Object Spaces

� Distributed Shared Memory (DSM)– provides notion of

global memory

� Easy to build applications over DSM

� Parallel computing apps. – Treadmarks [ZwaComp96]

� Shared object spaces

� Share application objects, not memory pages

� Avoid false sharing

� Applications – Computer Supported Cooperative Work

(CSCW), Massively Multiplayer Online Gaming (MMOG)

[ZwaComp96 ] C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and

W. Zwaenepoel. “TreadMarks: Shared Memory Computing on Networks of Workstations”.

IEEE Computer, 29(2):18-28, February 1996.


Scalability of Shared Object Spaces

� Centralized Components

� Orca [BalTSE92] – sequencer for TOM

� T spaces [WykSJ98] – single server for lookup

� Failures

� JavaSpaces – transaction coordinator

http://java.sun.com/products/jini/2.0/doc/specs/html/js-spec.html, 2001.

� Orca – sequencer

[BalTSE92] Henri E Bal, M Frans Kaashoek, and Andrew S Tanenbaum. Orca: A Language for Parallel

Programming of Distributed Systems. IEEE Transactions on Software Engineering, 18(3):190-205, 1992.

[WykSJ98] P Wyckoff, S W McLaughry, T J Lehman, and D A Ford. T Spaces. IBM Systems Journal,

37(3):454-474, 1998.


Scalability of Shared Object Spaces

� Object lookup

� Given object id – get to node with metadata about object or copy

of the object

� Centralized or naïve distributed – T spaces

� Lookup mechanisms in distributed object middleware

� Does not handle failures and scale up

� Consistency

� Relaxed consistency schemes in Treadmarks, Munin etc.

� Lazy, release, entry consistency

� Peer-to-Peer systems – tapestry [KubSAC04] and Pastry

(http://freepastry.rice.edu)

� Assume read-only replicas

[KubSAC04] Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C Rhea, Anthony D Joseph, and John D

Kubiatowicz. Tapestry: A resilient global-scale overlay for service deployment. IEEE Journal on Selected

Areas in Communications, 22(1), January 2004.


Peer-to-Peer (P2P) Systems

� Unstructured P2P systems

� Gnutella [gnutella], Freenet (http://freenetproject.org)

� Overlay – random graph

� Search – flooding/random walk

� Supports complex queries

� But no guarantees on search

� Application specific criteria for neighbourhoodformation

� Popular data – on nodes with good capacity

[gnutella] The Gnutella protocol specification, 2000. http://dss.clip2.com/GnutellaProtocol04.pdf.

Peer-to-Peer (P2P) Systems

� Structured P2P systems

� Pastry (http://freepastry.rice.edu), Tapestry [kubJSAC04] , Chord [BalToN03]

� Objects – identifiers or keys

� Overlay – Distributed Hash Table – node identifiers

� Maps keys to responsible nodes

� Search

� Guarantee – O(log(n))

� Simple queries – exact match

[BalToN03]Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank

Dabek, Hari Balakrishnan, “Chord: A Scalable Peer-to-peer Lookup Protocol for Internet

Applications”, IEEE/ACM Transactions on Networking, Vol. 11, No. 1, pp. 17-32, February 2003.

[kubJSAC04] Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, John D.

Kubiatowicz, "Tapestry: A Resilient Global-Scale Overlay for Service Deployment", IEEE Journal on

Selected Areas in Communications, Vol. 22, No. 1, January 2004.


Scaling up a shared object space:

Super-peer Virat

� Object lookup & failures

� OMRs form P2P overlay – pastry ring

� Lookup – o(log(n))

� K-replicas for each OMR

� Relaxed consistency models

� Consistency granularity

� OMR level consistency

� DSM objects read latest value from OMR

� Invalidation based approach

Super-peer Design Issues

� Super-peer failures

� Disconnect clients

� Super-peer replication

� K-replicas within zone

� Client-aware

� Reduces load

� Cluster-size

� Large – good for aggregate bandwidth [GarICDE03]

� Bottleneck

� Small – may reduce search efficiency

� Worst case – cluster size = 1

� Layer Management

� Dynamically vary super-layer and leaf-layer nodes [LiTPDS05]

[GarICDE03] Beverly Yang and Hector Garcia-Molina. ”Designing a super-peer network” , International Conference on

Data Engineering (ICDE), pages 49-62. IEEE Computer Society, March 2003.

[LiTPDS05] Li Xiao, Zhenyun Zhuang, and Yunhao Liu. “Dynamic Layer Management in Superpeer Architectures”, IEEE

Transactions on Parallel and Distributed Systems, 16(11), November 2005, pp. 1078-1081.


Virat: A Pure P2P Shared Object Space

� Virat over Vishwa (http://dos.iitm.ac.in/Vishwa)

� Vishwa – routing substrate

� Two layered architecture

� Unstructured layer – ensures Object Meta-data Repository (OMR)

replicas are within a zone (cluster)

• Capability based neighbourhood formation

� Structured layer – data to recover from failures

� Any node can play role of OMR

� Consistency of meta-data is easier

� OMR replicas are within a cluster

� Reconfigurable platform

� OMR failures are handled effectively


Virat over Vishwa

OMR1

OMR11

OMR12

OMR3

OMR31

OMR32

OMR2

OMR21

OMR22

12 1014

1913

15

231

245216

237

290

258

287

31

39

3235

38

13 14 231 32 39


Replicate Request in Virat


Virat over Vishwa: Node View

Client process

Virat Client Int (VCI) object instance

makeObjectSharable(Sharable obj)

replicate(long int oid)

read(long int oid)

write(Sharable obj, long int oid)

Routing component

Inter-process comm

Node 1 – Peer

Routing component

Super peer

OMR – object instance

route method


Virat over Vishwa: Consistency

� Virat over Vishwa

� Meta-data consistency

� Replicas of an OMR – proximity

• Unlike Virat over Pastry

� OMRs – Update leaf set neighbours

� Consistency of data

� Delta consistency – delta = number of updates that can be missed by a replica

� Current and max delta values for each object stored –meta-data

� Invalidation messages for delta = 0 replicas

� For others, decrement current delta

Performance Studies: T spaces VS Virat – Response Time

Sizeof Tspaces tuple = 16 bytes * 5 = 80 bytes

Sizeof Virat object = 16*7 + 1*8 = 120 bytes

Scaling up Virat


Implications of the study

� T spaces server – single point of failure, bottleneck

for scalability

� Implies that optimal grid may not scale up

� Built over T spaces

� Grid computing – scalability implications

� Centralized components

� Monitoring & Directory Service (MDS)

� Grid Resource Allocation and Management (GRAM)

� Grid schedulers

Related Work: Scalable Shared Object Spaces

� JuxMEM [AntINRIA05]

� Unify P2P and DSM concepts for data management in Grids� Realized over JXTA (Juxtapose – side by side)

� Cluster Manager (CM) – similar to OMR

� CMs form a P2P overlay

� Data consistency – JXTA’s multicast primitive• unreliable

� Secondary cluster manager to handle CM failures• Meta-data consistency – not addressed

� Evaluated only on a cluster

� Globe [TenCon99]

� Large scale shared object space

� Uses tree based object lookup mechanism� Failures not handled

[AntINRIA05] Gabriel Antoniu, Luc Bougé, Mathieu Jan, “Weaving together the P2P and DSM paradigms

to enable a Grid Data-sharing Service”, Technical Report ISSN 0249-6399, INRIA, France, Scalable

Computing: Practice and Experience (SCPE), Vol. 6, No. 3, September 2005, pp. 45-55.

[TenCon99] Maarten van Steen, Philip Homburg, and Andrew S. Tanenbaum, "Globe: A Wide-Area

Distributed System", IEEE Concurrency, January-March 1999, pp70-78.

Related Work: P2P Storage Systems� Ivy [MutOSR02]

� Read/write P2P file sharing system

� Conflict resolution – left to application

� Past [DruSOSP01]

� Provides persistent caching & storage mgmt layer over Pastry

� Files are immutable – cannot insert multiple times

� Oceanstore [KubArch00]

� Internet scale file sharing system – security & persistence

� Versioning – for read/write data� Find latest version of a file

� Not been evaluated for conflicting writes

� Inner circle of reliable servers

� All 3 – built over DHTs

� No application specific criteria for data placement

� Limited queries[DruSOSP01]Antony Rowstron and Peter Druschel. Storage management and caching in past, a large-scale,

persistent peer-to-peer storage utility. in SOSP '01: Proceedings of the eighteenth ACM symposium on

Operating systems principles, pages 188-201, New York, NY, USA, 2001. ACM Press.

[KubArch00]John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,

Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Chris Wells, and Ben Zhao. Oceanstore: an

architecture for global-scale persistent storage. SIGARCH Comput. Archit. News, 28(5):190-201, 2000.

[MutOSR02]Athicha Muthitacharoen, Robert Morris, Thomer M. Gil, and Benjie Chen. Ivy: a read/write

peer-to-peer file system. SIGOPS Operating Systems Review, 36(SI):31-44, 2002.

Related Work: Data Mgmt in Grids

� Replica Management Service (RMS) [Sto02]

� Interfaces required for RMS

� Entry point for wide area copy, replica catalogue services

� Models for replica sync and consistency [SegHPDC01]

� Use cases

� Quorum scheme – quorum dynamics?

� 2PC Based Algorithms for consistency [TanIJUC05]

� Extension of 2PC

� May have difficulty scaling up – global agreement

[Sto02] L Guy, P Kunszt, E Laure, H Stockinger, and K Stockinger. “Replica Management in Data Grids”,

Technical Report, GGF Working Draft, 2002.

[SegHPDC01] Dirk Düllmann and Ben Segal. “Models for Replica Synchronisation and Consistency in a

Data Grid”, In HPDC '01: Proceedings of the 10th IEEE International Symposium on High Performance

Distributed Computing (HPDC-10'01), page 67, Washington, DC, USA, 2001. IEEE Computer Society.

[TanIJUC05] Sushant Goel, Hema Sharda, and David Taniar. “Atomic Commitment and Resilience in Grid

Database Systems”, in International Journal of Grid and Utility Computing, 1(1):46-60, 2005.

Things to do

� Performance studies on Virat

� 1 billion objects – real scalability test?

� Internet emulation using ModelNet [VahOSDI02]

� Performance comparison with openDHT [KubCom05]

� HOT/SOC based heuristics for data placement in grids

� Simulation to quickly verify viability of heuristic

� Implementation of replication service

[VahOSDI02] Amin Vahdat, Ken Yocum, Kevin Walsh, Priya Mahadevan, Dejan Kostic, Jeff Chase,

and David Becker, “Scalability and Accuracy in a Large-Scale Network Emulator”, Proceedings of 5th

Symposium on Operating Systems Design and Implementation (OSDI), December 2002.

[KubCom05] Sean Rhea, Brighten Godfrey, Brad Karp, John Kubiatowicz, Sylvia Ratnasamy, Scott

Shenker, Ion Stoica, and Harlan Yu, “OpenDHT: A Public DHT Service and Its Uses”, Proceedings of

ACM SIGCOMM 2005, August 2005.

Documents

Data Management Over a Two Layered P2P Architecturedos.iitm.ac.in/Vishwa/dataManagementOverP2Psystems.pdf · Vishwa. 2/6/2006 Distributed & Object Systems Lab 7 Motivating Application: