Upload
angeni
View
44
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Managing Dynamic Metadata and Context. Mehmet S. Aktas Advisor: Prof. Geoffrey C. Fox. Context as Service Metadata. Context can be interaction-independent slowly varying, quasi-static service metadata interaction-dependent - PowerPoint PPT Presentation
Citation preview
Managing Dynamic Metadata and
Context
Mehmet S. Aktas
Advisor: Prof. Geoffrey C. Fox
2 of 34
Context as Service Metadata Context can be
interaction-independent slowly varying, quasi-static service metadata
interaction-dependent dynamically generated metadata as result of
interaction of services information associated to a single service, or a
session (service activity) or both
Dynamic Grid/Web Service Collections assembled to support a specific task can be workflow and audio/video collaborative
sessions generate metadata and have limited life-time these loosely assembled collections as "gaggles"
3 of 34
Motivating Cases Multimedia Collaboration domain
Global Multimedia Collaboration System- Global MMCS provides A/V conferencing system.
collaborative A/V sessions with varying types of metadata such as real-time metadata describing audio/video streams
characteristics: widely distributed services, metadata of events (archival data), mostly read-only
Workflow-style applications in GIS/Sensor Grids Pattern Informatics (PI) is an earthquake forecasting system. sensor grid data services generates events when a certain
magnitude of event (such as fault displacement) occurs firing off various services: filtering, analyzing raw data,
generating images, maps characteristics: any number of widely distributed services can
be involved, conversation metadata, transient, multiple writers
4 of 34
1WMS GUI WFS
http://..../..../..txt
HP SearchData Filter
PI Code
Data Filterhttp://..../..../tmp.xml
Context Information Service
2
5,6,7
8
4
3,9
<context xsd:type="ContextType"timeout=“100"><context-service>http://.../WMS</ context-service>
<activity-list mustUnderstand="true" mustPropagate="true">
<service>http://.../WMS</service> <service>http://.../HPSearch</service>
</activity-list> </context>
session
<context xsd:type="ContextType"timeout=“100"><context-service>http://.../HPSearch</ context-service><parent-context>http://../abcdef:012345<parent-context/><content> profile information related WMS </content>
</context>user profile
<context xsd:type="ContextType"timeout=“100"><context-service>http://.../HPSearch</ context-service><parent-context>http://../abcdef:012345<parent-context/><content> shared data for HPSearch activity </content>
<activity-list mustUnderstand="true" mustPropagate="true"> <service>http://.../DataFilter1</service> <service>http://.../PICode</service> <service>http://.../DataFilter2</service>
</activity-list> </context>
activity
<context xsd:type="ContextType"timeout=“100"> <context-id>http://../abcdef:012345<context-id/>
<context-service>http://.../HPSearch</ context-service> <content>http://danube.ucs.indiana.edu:8080\x.xml</content></context>
shared state
<?xml version="1.0" encoding="UTF-8"?> <soap:Envelope xmlns:soap="http://www.w3..."> <soap:Header encodingStyle=“WSCTX URL"
mustUnderstand="true"> <context xmlns=“ctxt schema“ timeout="100"> <context-id>http..</context-id> <context-service> http.. </context-service> <context-manager> http.. </context-service> <activity-list mustUnderstand="true" mustPropagate="true"> <p-service>http://../WMS</p-service> <p-service>http://../HPSearch</p-service> </activity-list> </context> </soap:Header>...
SOAP header for Context
1. session associated dynamic metadata2. user profile 3. activity associated dynamic metadata4. service associated dynamically generated metadata
What are the examples of dynamically generated metadata in a real-life example?
3,4: WMS starts a session, invokes HPSearch to run workflow script for PI Code with a session id
5,6,7: HPSearch runs the workflow script and generates output file in GML format (& PDF Format) as result
8: HPSearch writes the URI of the of the output file into Context
9: WMS polls the information from Context Service
10: WMS retrieves the generated output file by workflow script and generates a map
<context xsd:type="ContextType"timeout=“100"><context-service>http://.../HPSearch</ context-service>
<content> HPSearch associated additional data generated during execution of workflow. </content>
</context>service associated
5 of 34
Practical Problem We need a Grid Information Service for managing all
information associated with services in Gaggles for; correlating activities of widely distributed services
workflow style applications management of events in multimedia collaboration
providing information to enable real-time replay/playback session failure recovery
enabling uniform query capabilities “Give me list of services satisfying C:{a,b,c..} QoS
requirements and participating S:{x,y,z..} sessions”
6 of 34
Motivations Managing small scale highly dynamic metadata
as in dynamic Grid/Web Service collections Performance limitations in point-to-point based
service communication approaches for managing stateful service information
Lack of support for uniform hybrid query capabilities to both static and dynamic context information
Lack of support for adaptation to instantaneous changes in client demands
Lack of support for distributed session management capabilities especially in collaboration domain
7 of 34
Research Issues I Performance
Efficient mediator metadata strategies for service communication: high performance and persistency
Efficient access request distribution How to choose a replica server to best serve a client
request? How to provide adaptation to instantaneous changes
in client demands? Fault-tolerance
High availability of information Efficient replica-content creation strategies
8 of 34
Research Issues II Consistency
Provide consistency across the copies of a replica Flexibility
Accommodating broad range of application domains, such as read-dominated, read/write dominated
Interoperability Being compatible with wide range of applications Providing data models and programming interfaces
to perform hybrid queries over all service metadata to enable real-time replay/playback or session
recovery capabilities
9 of 34
Proposed System:Hybrid WS-Context Service Fault tolerant and high performance Grid
Information Service Caching module Publish/Subscribe for fault
tolerance, distribution, consistency enforcement Database backend and Extended UDDI Registry
WS-I compatible uniform programming interface Specification with abstract data models and
programming interface which combines WS-Context and UDDI in one hybrid service to manage service metadata
Hybrid functions operate on both metadata spaces Extended WS-Context functions operate on session
metadata Extended UDDI functions operate on interaction-
independent metadata
10 of 34Distributed HYBRID Grid Information Services
Subscriber
Publisher
Replica Server-2 Replica Server-N
Topic Based Publish-Subscribe Messaging System
HTTP(S)WSDLClient
WSDLClient
WSDL WSDL HYBRID
Grid Information Service(GIS)
Extended UDDI
WSD
L
JDBC
Replica Server-1
WSContext
ExtendedUDDI
WSDL
HYBRID GIS
WSContext
ExtendedUDDI
WSDL
HYBRID GIS
WSContext
11 of 34
Detailed architecture of the system
ClientWSDL
HTTP(S)
Ext-UDDI WS-Context
Access
WSDL
JDBC Handlers
ExpeditorQuerying
Publishing and
StorageSequencer
Publisher Subscriber
12 of 34
Key Design Features External Metadata Service
Extended UDDI Service for handling interaction-independent metadata
Cache Integrated Cache for all service metadata
Access Redirecting client request to an appropriate replica server
Storage Replicating data on an appropriate replica server
Consistency enforcement Ensuring all replicas of a data to be the same
13 of 34
Extended UDDI XML Metadata Service An extended UDDI XML Metadata Service
Alternative to OGC Web Registry Services It supports different types of metadata
GIS Metadata Catalog (functional metadata) User-defined metadata ((name, value) pairs)
It provides unique capabilities Up-to-date service registry information (leasing) Dynamic aggregation of geospatial services
It enables advanced query capabilities Geo-spatial queries Metadata oriented queries Domain independent queries
14 of 34
TupleSpaces Paradigm and JavaSpaces TupleSpaces [Gelernter-99]
a data-centric asynchronous communication paradigm communication units are tuples (data structure)
JavaSpaces [Sun Microsystems] java based object oriented implementation spaces are transactional secure
mutual exclusive access to objects spaces are persistent
temporal, spatial uncoupling spaces are associative
content based search
15 of 34
Publish/Subscribe Paradigm and NaradaBrokering Publish-Subscribe communication paradigm
Message based asynchronous communication Participants are decoupled both in space and in time
Open source NaradaBrokering software topic based publish/subscribe messaging system runs on a network of cooperating broker nodes. provides support for variety of QoSs, such as low
latency, reliable message delivery, support for multiple transfer protocols, security, and so forth.
16 of 34
Caching Strategy Integrated caching capability for both UDDI-type and WS-
Context-type metadata light-weight implementation of JavaSpaces data sharing, associative lookup, and persistency both WS-Context-type and common UDDI-type standard
operations
The system stores all keys and modest size values in memory, while big size values are stored in the database. We assume that today’s servers are capable of holding such
small size metadata in cache. All modest-size metadata accesses happen in memory
WS-Context type metadata is backed-up into MySQL database, while the UDDI-type metadata is stored into extended UDDI every so often for persistency
17 of 34
Performance Model and Measurements
Average±error (ms) Stddev (ms)
Hybrid-WS-Context Inquiry 12.29±0.02 0.48
Extended UDDI Inquiry 17.68±0.06 0.84
P4, 3.4GHz, 1GB memory,Java SDK 1.4.2, both client and services on the same machine
Simulation ParametersMetadata size 1.7 KB
Registry size 500 servicesInquiry type UDDI-queryObservation 200
18 of 34
Hybrid WS-Context Caching ApproachPersistency investigation
The figure shows the average execution time for varying backup frequency.
The system shows a stable performance until after the backup frequency is every 10 seconds.
Simulation parametersMetadata size 1.7 KbytesObservation 200
10 100 1000 10000 100000123456789
101112
Round Trip Chart for WS-Context Standard Operations for varying backup-interval times
Average for PublicationSTDev for PublicationAverage for InquirySTDev for Inquiry
Backup-time interval (logaritmic scale)
Tim
e (m
sec)
19 of 34
Hybrid WS-Context Caching Approach Performance investigation
% 49 performance increase in inquiry % 53 performance gain in publication functions compared to database solution.
System processing overhead is less than 1 milliseconds.
Simulation parametersBackup frequency
every 10 seconds
Metadata size 1.7 KbytesRegistry size 5000 metadataObservation 200
1 2 3 4 50
2
4
6
8
10
12
14
16
18Round Trip Time Chart for WS-Context Publication Requests
Average - Echo service
Average - memory access
Average - dabase access
STDev - Echo service
STDev - database access
STDev - memory access
Repeated Test Cases
Tim
e (m
sec)
1 2 3 4 50
2
4
6
8
10
12
14
16Round Trip Time Chart for WS-Context Inquiry Requests
Average - Echo service
Average - memory access
Average - dabase access
STDev - Echo Service
STDev - memory access
STDev - database access
Repeated Test Cases
Tim
e (m
sec)
20 of 34
Hybrid WS-Context Caching ApproachMessage rate scalability investigation
This figure shows the system behavior under increasing message rates.
The system scales up to 940 inquiry messages/second and 480 publication messages/second.
Simulation parametersBackup frequency
every 10 seconds
Metadata size 1.7 KbytesRegistry size 100 metadata
100 200 300 400 500 600 700 800 900 10000
10
20
30
40
50
60
70inquiry message rate
publication message rate
message processing rate (message/per second)
avg
tim
e (m
s) p
er m
essa
ge
21 of 34
Hybrid WS-Context Caching ApproachMessage size scalability investigation
This figure shows the system behavior under increasing message sizes.
The system performs well for small size context. Performance remains same between 100Byte and 10KBytes context payloads.
Simulation parametersBackup frequency
every 10 seconds
Registry size 5000 metadataObservation 200
10 20 30 40 50 60 70 80 90 1000
5
10
15
20
25
30
35
40Round Trip Time Chart for WS-Context Publication Operation
Average - Echo Service
STDev - Echo Service
Average - memory access
STDev - memory access
Average - database access
STDev - database access
context payload size (KB)
time
(mill
isec
onds
)
0.1 1.0 10.0 100.00
5
10
15
20
25
30
Round Trip Time Chart for WS-Context Standard Operations
Average - publication
STDev -publication
Average - inquiry
STDev - Inquiry
context payload size (KB) (logarithmic scale)
avg
roun
d tri
p tim
e (m
illis
econ
ds)
This figure shows the system behavior under increasing message sizes between 10 KB and 100 KB.
The system spends an additional ~7 ms to store big size values in the database.
22 of 34
Access: Request Distribution Pub-sub system based message distribution Broadcast-based request dissemination based on
a hashing scheme Keys are hashed to values (topics) that runs from 1 to
1000 Each replica holder subscribes to topics (the hash
values) of the keys they have Each access request is broadcast on the topic
correspond to the key. Replica holders unicast a response with a copy of the
context under demand Advantages
does not flood the network with access request messages
does not keep track of locations of every single data
23 of 34
Access Distribution ExperimentTest Methodology
NB node
HybridWS-Context
instance
HybridWS-Context
instanceBloomington, IN 1 - Indianapolis, IN
2- Tallahassee, FL
3- San Diego, CA
HybridWS-Context
instance
HybridWS-Context
instanceBloomington, IN 1 - Indianapolis, IN
2- Tallahassee, FL
3- San Diego, CA
NB node
NB node
T1 T2 T3Time = T1 + T2 + T3
Simulation parametersBackup frequency every 10 secondsMessage size 2.7 Kbytes
The test consists of a NaradaBrokering server and two hybrid WS-Context instances for access request distribution.
We determine the time for avg. cost end-to-end metadata access.
We run the system for 25000 observations.
Gridfarm and Teragrid machines used for testing purposes.
24 of 34
Distribution experiment result
The figure shows average results for every 1000 observation. We have 25000 continuous observations.
The average transfer time shows the continuous access distribution operation does not degrade the performance.
bloomington-indianapolis bloomington-tallahassee bloomington-san diego0
10
20
30
40
50
60
70
overhead of distribution when using one intermediary brokeroverhead of distribution when using two intermediary brokerslatency
Tim
e (m
s)
The figure shows the time required for various activities of access request distribution.
The average overhead of distribution using the pub-sub system remains the same regardless of the network distances between nodes.
0 5 10 15 20 250
1
2
3
4
5
6
7
8 Bloomington - Indianapolis Access Distribution Chart
Average - Latency
STDev - Latency
Average - One Broker
STDev - One Broker
Average - Two Brokers
STDev - Two Brokers
Every 1000 observations
Tim
e (m
s)
0 5 10 15 20 2505
1015
202530
354045
Bloomington, IN - Tallahassee, Florida Distribution Chart
Average - Latency
STDev - Latency
Average - One Broker
STDev - One Broker
Average - Two Brokers
STDev - Two Brokers
Every 1000 observations
Tim
e (m
s)
0 5 10 15 20 250
10
20
30
40
50
60
70
80Bloomington - San Diego Distribution Chart
Average - Latency
STDev - Latency
Average - One Broker
STDev - One Broker
Average - Two Brokers
STDev - Two Brokers
Every 1000 observations
Tim
e (m
s)
25 of 34
Optimizing Performance:Dynamic migration/replication Dynamic migration/replication
A methodology for creating temporary copies of a context in the proximity of their requestors.
Autonomous decisions replication decision belongs to the server
Algorithm based on [Rabinovich et al, 1999] The system keeps the popularity (# of access requests)
record for each copy and flush it on regular time intervals The system checks local data every so often for dynamic
migration or replication Unpopular server-initiated copies are deleted Popular copies are moved where they wanted Very popular copies are replicated to where they wanted
26 of 34
NB node
HybridWS-Context
instance
HybridWS-Context
instanceBloomington, IN Indianapolis, IN
Test-1 Distribution with Dynamic Replication Enabled
Test-2 Distribution with Dynamic Replication Disabled
NB node
HybridWS-Context
instance
HybridWS-Context
instanceBloomington, IN Indianapolis, IN
T1 T2 T3Time = T1 + T2 + T3
Simulation parametersmessage size / message rate 2.7 Kbytes / 10 msg/sec
replication decision frequency every 100 seconds
deletion threshold 0.03 request/second
replication threshold 0.18 request/second
registry size 1000 metadata in Indianapolis
The test consists of a NaradaBrokering server and two hybrid WS-Context instances for access request distribution.
We determine the time for mean end-to-end metadata access.
We run the system for app. 45 minutes on Gridfarm and complexity machines.
Dynamic Replication PerformanceTest Methodology
27 of 34
The figure shows average results for every 100 seconds.
The decrease in average latency shows that the algorithm manages to move replica copies to where they wanted.
0 5 10 15 20 250
1
2
3
4
5
6
7
Dynamic Replication Performance Chart - Distribution between Bloom-ington, IN and Indianapolis, IN
Average - Dynamic Replication
STDev - Dynamic Replication
Average - Distribu-tion
STDev - Distribution
Every 100 sec
Late
ncy
(ms)
28 of 34
Storage: Replica content placement Pub-sub system for replica content placement Each node keeps a Replica Server Map
The new coming node sends a multicast probe message when it joins a network
Each network node responds with a unicast message to make themselves discoverable
Selection of Replica Server(s) for content placement Select a node based on proximity weighting factor
Sending storage request to selected replica servers 1st step: initiator unicasts storage request to each selected
replica server 2nd step: recipient server stores the context and becomes
subscriber to the topic of that context 3rd step: an acknowledgement is sent (unicast) to the initiator
29 of 34
Fault-tolerance experiment Testing Setup
WS-Context instance
WS-Context instance
Bloomington, IN
NB node
NB node
WS-Context instance
NB node
WS-Context instance
NB node
Indianapolis, IN
Tallahassee, FL
San Diego, CA
WS-Context instance
WS-Context instance
Bloomington, IN
NB node
WS-Context instance
WS-Context instance
Indianapolis, IN
Tallahassee, FL
San Diego, CA
Test - 1
Test - 2
Simulation parametersBackup frequency every 10 secondsMessage size 2.7 Kbytes
The test system consists of a NaradaBrokering server(s) and four hybrid WS-Context instances separated with significant network instances.
We determine the time for average end-to-end replica content creation.
We run the system continuously for 25000 observations.
Gridfarm and Teragrid machines used for testing purposes.
30 of 34
Fault-tolerance experiment result
The figure shows average results for every 1000 observation. The system was continuously tested for 25000 observations.
The results indicate the continuous operation does not degrade the performance.
1 replica creation (In-dianapolis)
2 replica creation (Indi-anapolis, IN - Tallahassee,
FL)
3 replica creation (Indi-anapolis-IN, Tallahassee-
FL, San Diego-CA)
0
10
20
30
40
50
60
70
overhead of replica creation when using one intermediary brokeroverhead of replica creation when using two intermediary brokersend-to-end latency
Tim
e (m
s)
The figure shows the results gathered from fault-tolerance experiments data.
Overhead of replica creation increases in the order of milliseconds as the fault-tolerance level increase.
0 5 10 15 20 250123456789
1 replica creation at remote location: Indianapolis, IN
Average - Latency
STDev - Latency
Average - One Broker
STDev - One Broker
Average - Two Brokers
STDev - Two Brokers
Every 1000 observations
Tim
e (m
s)
0 5 10 15 20 250
10
20
30
40
50
60
70
80
3 replica creation at remote locations: San Diego, Indianapo-lis and Tallahase - Fault Tolerance Chart
Average - Latency
STDev - Latency
Average - One Broker
STDev - One Broker
Average - Two Brokers
STDev - Two Brokers
Every 1000 observations
Tim
e (m
s)
0 5 10 15 20 2505
1015202530354045
2 replica creation at remote locations: Indianapolis, Tal-lahase - Fault Tolerance Chart
Average - Latency
STDev - Latency
Average - One Broker
STDev - One Broker
Average - Two Brokers
STDev - Two Brokers
Every 1000 observations
Tim
e (m
s)
31 of 34
Consistency enforcement Pub-sub system for enforcing consistency Primary-copy approach
Updates of a same data are carried out at a single server Use of NTP protocol based synchronized timestamps to impose
an order to write operations on the same data Update distribution
1st step: An update request is forwarded (unicast) to the primary copy holder by the initiator
2nd step: The primary-copy holder performs the update request and returns an acknowledgement
Update propagation The primary-copy pushes (broadcasts) updates of a context, on the topic (hash value) correspond to the key of the context, if the primary-copy realizes that there exist a stale copy in the
system.
32 of 34
Consistency Enforcement ExperimentTest Methodology
NB node
HybridWS-Context
instance
HybridWS-Context
instanceBloomington, IN 1 - Indianapolis, IN
2- Tallahassee, FL
3- San Diego, CA
HybridWS-Context
instance
HybridWS-Context
instance
Bloomington, IN 1 - Indianapolis, IN
2- Tallahassee, FL
3- San Diego, CA
NB node
NB node
T1 T2 T3Time = T1 + T2 + T3
Simulation parametersBackup frequency every 10 secondsMessage size 2.7 Kbytes
The test system consists of a NaradaBrokering server and two hybrid WS-Context instances for access request distribution.
We determine the avg. time required for enforcing consistency.
We run the system for 25000 observations.
Gridfarm and Teragrid machines used for testing purposes.
33 of 34
Consistency Enforcement Test Result
The figure shows average results for every 1000 observation. We have 25000 continuous observations.
The average transfer time shows the continuous operation does not degrade the performance.
The figure shows the results gathered from consistency experiments data.
The results indicate that the overhead of consistency enforcement is in milliseconds and the cost remains the same regardless of distribution of the network nodes.
bloomington-indianapolis bloomington-tallahassee bloomington-san diego0
10
20
30
40
50
60
70
overhead of distribution when using one intermediary brokeroverhead of distribution when using two intermediary brokerslatency
Tim
e (m
s)
0 5 10 15 20 250
1
2
3
4
5
6
7
8
9Bloomington - Indianapolis Consistency Enforcement Chart
Average - Latency
STDev - Latency
Average - One Broker
STDev - One Broker
Average - Two Brokers
STDev - Two Brokers
Every 1000 observations
Tim
e (m
s)
0 5 10 15 20 2505
1015202530354045
Bloomington, IN - Tallahassee, Florida Consistency En-forcement Chart
Average - Latency
STDev - Latency
Average - One Broker
STDev - One Broker
Average - Two Brokers
STDev - Two Brokers
Every 1000 observations
Tim
e (m
s)
0 5 10 15 20 250
10
20
30
40
50
60
70
80Bloomington - San Diego Consistency Enforcement Chart
Average - Latency
STDev - Latency
Average - One Broker
STDev - One Broker
Average - Two Brokers
STDev - Two Brokers
Every 1000 observations
Tim
e (m
s)
34 of 34
Comparison of Experiment Results
The figure shows the results gathered from the distribution, fault-tolerance and consistency experiments data.
The results indicate that the overhead of integrating JavaSpaces with pub-sub system for distribution, fault-tolerance, and consistency enforcement is in the order of milliseconds.
distribution consistency en-forcement
fault tolerance (3 replica creation)
0
1
2
3
4
5
6
7
One Broker
Two Brokers
Tim
e sc
ale
(ms)
35 of 34
Contribution We have shown that communication among services can
be achieved with efficient mediator metadata strategies. Efficient mediator services allow us to perform collective
operations such as queries on subsets of all available metadata in service conversation.
We have shown that efficient decentralized metadata system can be built by integrating JavaSpaces with Publish/Subscribe paradigm. Fault-tolerance, distribution and consistency can be succeeded
with few milliseconds system processing overhead.
We have shown that adaptation to instantaneous changes in client demands can be achieved in decentralized metadata management.
We have introduced data models and programming interfaces that provides uniform search interface to both interaction independent and conversation-based service metadata.