Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Data Management Over a Two Layered
P2P Architecture
A.Vijay Srinivas
PhD Research Scholar
Department of Computer Science & Engineering,
Indian Institute of Technology Madras
Guide:
Prof. D. Janakiram
2/6/2006 Distributed & Object Systems Lab 2
Two-layered P2P Architecture: A Generic Platform for
Large Scale Resource Sharing
� Large scale resource sharing
� Find resource (data/computing)
� Neighbourhood – proximity
� Capabilities – memory, processing power and storage
� Dynamics – node and network load variations
� Share disparate resources
� Scalability and fault-tolerance
� Case studies
� Vishwa – compute grid middleware
� Process migration for performance/adaptability
� Virat – data grid framework
2/6/2006 Distributed & Object Systems Lab 3
Requirements – Data Grids
� Functional
� Data management
� Data formats & structures
� Replica mgmt.
� Data access
� Data transfer
� Data querying/searching
� Meta-data management
� Description, automatic generation, interoperability of data formats
� Repository
• Dependable access to meta-data
2/6/2006 Distributed & Object Systems Lab 4
Requirements – Data Grids
� Functional
� Data processing
� Allow discovery of data/computing resources
� Resource mgmt & scheduling
� Non-functional
� Security
� Scalability
� No. of users or data entities in the system
� Robustness – Fault-tolerance
2/6/2006 Distributed & Object Systems Lab 5
Scalability – key non-functional requirement
� Scalability
� No centralized components
� No global knowledge
� Resources/Data discovery
� Proximity and capability
� Failures
� Dependability
• Applications – seamlessly adapt to node/network failures
� Reconfigurability
• Middleware components – must be adaptive
� Data replication
� Catalogue/meta-data
� Consistency
2/6/2006 Distributed & Object Systems Lab 6
Summary: Requirements of platform
for data grids
� Data management
� Data, meta-data formats, description
� Replica mgmt
� Data querying/searching
� Discovery of capable data/computing resources
� High performance data transfer protocol
� Resource management & scheduling
Virat Virat – limited keyword
searching achievable
Virat: Unstructured
P2P layer
gridFTP
Vishwa
2/6/2006 Distributed & Object Systems Lab 7
Motivating Application: Tele-medicine
Heart patient in village1
Mobile doctor in city
ECG Report
Healthcare
centre in
city
Healthcare
centre in
city
Doctor in
healthcare
centre
Patient recordsPatient
records1
2
2
2
23
Doctor
feedback
3
Doctor
feedback
2/6/2006 Distributed & Object Systems Lab 8
Shared object spaces for
wide area computing
� Shared object spaces –realizations don’t scale up
� P2P systems – can’t be used directly
� Solutions
� Super-peer Virat – Virat over Pastry• Partially P2P Virat
– Naïve Virat shared object event and service space
– Integration with Pastry
� Fully P2P Virat – Virat over Vishwa
� Performance Studies
� Comparison with T Spaces – a standard tuple space implementation from IBM
2/6/2006 Distributed & Object Systems Lab 9
Shared Object Spaces
� Distributed Shared Memory (DSM)– provides notion of
global memory
� Easy to build applications over DSM
� Parallel computing apps. – Treadmarks [ZwaComp96]
� Shared object spaces
� Share application objects, not memory pages
� Avoid false sharing
� Applications – Computer Supported Cooperative Work
(CSCW), Massively Multiplayer Online Gaming (MMOG)
[ZwaComp96 ] C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and
W. Zwaenepoel. “TreadMarks: Shared Memory Computing on Networks of Workstations”.
IEEE Computer, 29(2):18-28, February 1996.
2/6/2006 Distributed & Object Systems Lab 10
Scalability of Shared Object Spaces
� Centralized Components
� Orca [BalTSE92] – sequencer for TOM
� T spaces [WykSJ98] – single server for lookup
� Failures
� JavaSpaces – transaction coordinator
http://java.sun.com/products/jini/2.0/doc/specs/html/js-spec.html, 2001.
� Orca – sequencer
[BalTSE92] Henri E Bal, M Frans Kaashoek, and Andrew S Tanenbaum. Orca: A Language for Parallel
Programming of Distributed Systems. IEEE Transactions on Software Engineering, 18(3):190-205, 1992.
[WykSJ98] P Wyckoff, S W McLaughry, T J Lehman, and D A Ford. T Spaces. IBM Systems Journal,
37(3):454-474, 1998.
2/6/2006 Distributed & Object Systems Lab 11
Scalability of Shared Object Spaces
� Object lookup
� Given object id – get to node with metadata about object or copy
of the object
� Centralized or naïve distributed – T spaces
� Lookup mechanisms in distributed object middleware
� Does not handle failures and scale up
� Consistency
� Relaxed consistency schemes in Treadmarks, Munin etc.
� Lazy, release, entry consistency
� Peer-to-Peer systems – tapestry [KubSAC04] and Pastry
(http://freepastry.rice.edu)
� Assume read-only replicas
[KubSAC04] Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C Rhea, Anthony D Joseph, and John D
Kubiatowicz. Tapestry: A resilient global-scale overlay for service deployment. IEEE Journal on Selected
Areas in Communications, 22(1), January 2004.
2/6/2006 Distributed & Object Systems Lab 12
Peer-to-Peer (P2P) Systems
� Unstructured P2P systems
� Gnutella [gnutella], Freenet (http://freenetproject.org)
� Overlay – random graph
� Search – flooding/random walk
� Supports complex queries
� But no guarantees on search
� Application specific criteria for neighbourhoodformation
� Popular data – on nodes with good capacity
[gnutella] The Gnutella protocol specification, 2000. http://dss.clip2.com/GnutellaProtocol04.pdf.
Peer-to-Peer (P2P) Systems
� Structured P2P systems
� Pastry (http://freepastry.rice.edu), Tapestry [kubJSAC04] , Chord [BalToN03]
� Objects – identifiers or keys
� Overlay – Distributed Hash Table – node identifiers
� Maps keys to responsible nodes
� Search
� Guarantee – O(log(n))
� Simple queries – exact match
[BalToN03]Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank
Dabek, Hari Balakrishnan, “Chord: A Scalable Peer-to-peer Lookup Protocol for Internet
Applications”, IEEE/ACM Transactions on Networking, Vol. 11, No. 1, pp. 17-32, February 2003.
[kubJSAC04] Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, John D.
Kubiatowicz, "Tapestry: A Resilient Global-Scale Overlay for Service Deployment", IEEE Journal on
Selected Areas in Communications, Vol. 22, No. 1, January 2004.
2/6/2006 Distributed & Object Systems Lab 14
Scaling up a shared object space:
Super-peer Virat
� Object lookup & failures
� OMRs form P2P overlay – pastry ring
� Lookup – o(log(n))
� K-replicas for each OMR
� Relaxed consistency models
� Consistency granularity
� OMR level consistency
� DSM objects read latest value from OMR
� Invalidation based approach
Super-peer Design Issues
� Super-peer failures
� Disconnect clients
� Super-peer replication
� K-replicas within zone
� Client-aware
� Reduces load
� Cluster-size
� Large – good for aggregate bandwidth [GarICDE03]
� Bottleneck
� Small – may reduce search efficiency
� Worst case – cluster size = 1
� Layer Management
� Dynamically vary super-layer and leaf-layer nodes [LiTPDS05]
[GarICDE03] Beverly Yang and Hector Garcia-Molina. ”Designing a super-peer network” , International Conference on
Data Engineering (ICDE), pages 49-62. IEEE Computer Society, March 2003.
[LiTPDS05] Li Xiao, Zhenyun Zhuang, and Yunhao Liu. “Dynamic Layer Management in Superpeer Architectures”, IEEE
Transactions on Parallel and Distributed Systems, 16(11), November 2005, pp. 1078-1081.
2/6/2006 Distributed & Object Systems Lab 16
Virat: A Pure P2P Shared Object Space
� Virat over Vishwa (http://dos.iitm.ac.in/Vishwa)
� Vishwa – routing substrate
� Two layered architecture
� Unstructured layer – ensures Object Meta-data Repository (OMR)
replicas are within a zone (cluster)
• Capability based neighbourhood formation
� Structured layer – data to recover from failures
� Any node can play role of OMR
� Consistency of meta-data is easier
� OMR replicas are within a cluster
� Reconfigurable platform
� OMR failures are handled effectively
2/6/2006 Distributed & Object Systems Lab 17
Virat over Vishwa
OMR1
OMR11
OMR12
OMR3
OMR31
OMR32
OMR2
OMR21
OMR22
12 1014
1913
15
231
245216
237
290
258
287
31
39
3235
38
13 14 231 32 39
2/6/2006 Distributed & Object Systems Lab 18
Replicate Request in Virat
2/6/2006 Distributed & Object Systems Lab 19
Virat over Vishwa: Node View
Client process
Virat Client Int (VCI) object instance
makeObjectSharable(Sharable obj)
replicate(long int oid)
read(long int oid)
write(Sharable obj, long int oid)
Routing component
Inter-process comm
Node 1 – Peer
Routing component
Super peer
OMR – object instance
route method
2/6/2006 Distributed & Object Systems Lab 20
Virat over Vishwa: Consistency
� Virat over Vishwa
� Meta-data consistency
� Replicas of an OMR – proximity
• Unlike Virat over Pastry
� OMRs – Update leaf set neighbours
� Consistency of data
� Delta consistency – delta = number of updates that can be missed by a replica
� Current and max delta values for each object stored –meta-data
� Invalidation messages for delta = 0 replicas
� For others, decrement current delta
Performance Studies: T spaces VS Virat – Response Time
Sizeof Tspaces tuple = 16 bytes * 5 = 80 bytes
Sizeof Virat object = 16*7 + 1*8 = 120 bytes
Scaling up Virat
2/6/2006 Distributed & Object Systems Lab 23
Implications of the study
� T spaces server – single point of failure, bottleneck
for scalability
� Implies that optimal grid may not scale up
� Built over T spaces
� Grid computing – scalability implications
� Centralized components
� Monitoring & Directory Service (MDS)
� Grid Resource Allocation and Management (GRAM)
� Grid schedulers
Related Work: Scalable Shared Object Spaces
� JuxMEM [AntINRIA05]
� Unify P2P and DSM concepts for data management in Grids� Realized over JXTA (Juxtapose – side by side)
� Cluster Manager (CM) – similar to OMR
� CMs form a P2P overlay
� Data consistency – JXTA’s multicast primitive• unreliable
� Secondary cluster manager to handle CM failures• Meta-data consistency – not addressed
� Evaluated only on a cluster
� Globe [TenCon99]
� Large scale shared object space
� Uses tree based object lookup mechanism� Failures not handled
[AntINRIA05] Gabriel Antoniu, Luc Bougé, Mathieu Jan, “Weaving together the P2P and DSM paradigms
to enable a Grid Data-sharing Service”, Technical Report ISSN 0249-6399, INRIA, France, Scalable
Computing: Practice and Experience (SCPE), Vol. 6, No. 3, September 2005, pp. 45-55.
[TenCon99] Maarten van Steen, Philip Homburg, and Andrew S. Tanenbaum, "Globe: A Wide-Area
Distributed System", IEEE Concurrency, January-March 1999, pp70-78.
Related Work: P2P Storage Systems� Ivy [MutOSR02]
� Read/write P2P file sharing system
� Conflict resolution – left to application
� Past [DruSOSP01]
� Provides persistent caching & storage mgmt layer over Pastry
� Files are immutable – cannot insert multiple times
� Oceanstore [KubArch00]
� Internet scale file sharing system – security & persistence
� Versioning – for read/write data� Find latest version of a file
� Not been evaluated for conflicting writes
� Inner circle of reliable servers
� All 3 – built over DHTs
� No application specific criteria for data placement
� Limited queries[DruSOSP01]Antony Rowstron and Peter Druschel. Storage management and caching in past, a large-scale,
persistent peer-to-peer storage utility. in SOSP '01: Proceedings of the eighteenth ACM symposium on
Operating systems principles, pages 188-201, New York, NY, USA, 2001. ACM Press.
[KubArch00]John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,
Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Chris Wells, and Ben Zhao. Oceanstore: an
architecture for global-scale persistent storage. SIGARCH Comput. Archit. News, 28(5):190-201, 2000.
[MutOSR02]Athicha Muthitacharoen, Robert Morris, Thomer M. Gil, and Benjie Chen. Ivy: a read/write
peer-to-peer file system. SIGOPS Operating Systems Review, 36(SI):31-44, 2002.
Related Work: Data Mgmt in Grids
� Replica Management Service (RMS) [Sto02]
� Interfaces required for RMS
� Entry point for wide area copy, replica catalogue services
� Models for replica sync and consistency [SegHPDC01]
� Use cases
� Quorum scheme – quorum dynamics?
� 2PC Based Algorithms for consistency [TanIJUC05]
� Extension of 2PC
� May have difficulty scaling up – global agreement
[Sto02] L Guy, P Kunszt, E Laure, H Stockinger, and K Stockinger. “Replica Management in Data Grids”,
Technical Report, GGF Working Draft, 2002.
[SegHPDC01] Dirk Düllmann and Ben Segal. “Models for Replica Synchronisation and Consistency in a
Data Grid”, In HPDC '01: Proceedings of the 10th IEEE International Symposium on High Performance
Distributed Computing (HPDC-10'01), page 67, Washington, DC, USA, 2001. IEEE Computer Society.
[TanIJUC05] Sushant Goel, Hema Sharda, and David Taniar. “Atomic Commitment and Resilience in Grid
Database Systems”, in International Journal of Grid and Utility Computing, 1(1):46-60, 2005.
Things to do
� Performance studies on Virat
� 1 billion objects – real scalability test?
� Internet emulation using ModelNet [VahOSDI02]
� Performance comparison with openDHT [KubCom05]
� HOT/SOC based heuristics for data placement in grids
� Simulation to quickly verify viability of heuristic
� Implementation of replication service
[VahOSDI02] Amin Vahdat, Ken Yocum, Kevin Walsh, Priya Mahadevan, Dejan Kostic, Jeff Chase,
and David Becker, “Scalability and Accuracy in a Large-Scale Network Emulator”, Proceedings of 5th
Symposium on Operating Systems Design and Implementation (OSDI), December 2002.
[KubCom05] Sean Rhea, Brighten Godfrey, Brad Karp, John Kubiatowicz, Sylvia Ratnasamy, Scott
Shenker, Ion Stoica, and Harlan Yu, “OpenDHT: A Public DHT Service and Its Uses”, Proceedings of
ACM SIGCOMM 2005, August 2005.