36
Dynamic XML documents with distribution and replication Angela Bonifati (currently in Icar-CNR, Italy) Joint work with: Serge Abiteboul, Gregory Cobéna, Ioana Manolescu (INRIA), Tova Milo (INRIA &Tel Aviv U.)

Dynamic XML documents with distribution and replication Angela Bonifati (currently in Icar-CNR, Italy) Joint work with: Serge Abiteboul, Gregory Cobéna,

Embed Size (px)

Citation preview

Dynamic XML documents with distribution and replication

Angela Bonifati (currently in Icar-CNR, Italy)

Joint work with:

Serge Abiteboul, Gregory Cobéna, Ioana Manolescu (INRIA), Tova Milo (INRIA &Tel Aviv U.)

Dynamic XML documents with distribution and replication 2

Outline

Motivation Dynamic XML documents and Active XML Peer-to-peer data management

Distributing and replicating AXML documents Data model Replication / distribution alternatives

Peer-to-peer querying of AXMLDR documents

Automatic (self-configurable) replication

Conclusion and perspectives

Dynamic XML documents with distribution and replication 3

Dynamic documents

Documents having An extensional (static) part An intensional (dynamic) part

Many existing applications HTML + embedded active components

• Dynamic page generation from databases (ASP, PHP, JSP)• Pages with embedded "update" features

XML-structured pages + active components• CGI, beans, …, web services (e.g. DreamWeaver)

Dynamic XML documents with distribution and replication 4

Distribution and replication

Interest of data distributionMove data fragments geographically close to

their users Allows for distributed processing

Interest of data replicationHigher availabilityBetter performance

Dynamic XML documents with distribution and replication 5

The focus of this work

Distribution and replication of dynamic documents.

Focus on the new aspects

Dynamic XML documents with distribution and replication 6

Context: Active XML (AXML)

A language: XML with embedded service calls A peer-to-peer system Each peer

Repository of intensional (AXML) documents Server: provides Web services (XQuery) Client: when invoking the embedded service calls And many more cool features

• exchanging intensional data (yesterday morning)• continuous services• etc.

AXML peers have already inherent distribution

Dynamic XML documents with distribution and replication 7

XML documents embedding calls to external Web services

Active XML documents

unisysWeather/getSnowInfo

snowInfo

consist.level

skiPortal

resort

sc

"Aspen"

service params

unysisWeather/getSnowInfo

descr

name

"Aspen"

skiGear

skis

Dynamic XML documents with distribution and replication 8

XML documents embedding calls to external Web services

Active XML documents

unisysWeather/getSnowInfo

snowInfo

consist.level

skiPortal

resort

sc

"Aspen"

service params

unysisWeather/getSnowInfo

descr

name

"Aspen"

skiGear

skis

Dynamic XML documents with distribution and replication 9

XML documents embedding calls to external Web services

Active XML documents

snowInfo

consist.level

skiPortal

resort

sc

"Aspen"

service params

unysisWeather/getSnowInfo

descr

name

"Aspen"

skiGear

skis

control of activation of service calls [ABM01, ABM+02]

Dynamic XML documents with distribution and replication 10

AXML web services: declarative XML queries

Many useful services may be defined as queries over (A)XML documents

Define service uniSysWeather / getSnowInfo($resortName) as: for $s in document("weather.xml")//snowInfo,

$n in $s//resort/ @name where $n=$resortName return $i

Dynamic XML documents with distribution and replication 11

Peer to peer deployment of AXML documents

d1

s2

Ski portal peer

skiPortal

s1

getHotelsbyResort($res)

UniSys Weather peer

Colorado News peerallNews

getShows($day)s4

getPolitics($day)

s3

Distributing and replicating dynamic documents by example

Dynamic XML documents with distribution and replication 13

Plan

Replication of dynamic data Distribution of dynamic data Querying distributed and replicated data Cost model Self-configurable replication

Dynamic XML documents with distribution and replication 14

Replication of dynamic documents

Peer localAgency wishes to hold the data regarding resorts with high snow levels

naive solution: localAgency queries skiPortal, stores result

LocalAgencyXML query

skiPortal Ski portal

sc

"Aspen"

serv params

unysisWeather/getSnowInfo

snowInfo

consist.level

resort

descr

name

"Aspen"

freq

1day

resort

XML resultsnowInfo

consist.level

descr

name

"Aspen"

Dynamic XML documents with distribution and replication 15

Replication of dynamic documents

Peer localAgency wishes to hold the data regarding resorts with high snow levels

naive solution: localAgency queries skiPortal, stores result

LocalAgencyXML query

XML result

Query result gets stale every 24 hours !

skiPortal Ski portal

sc

"Aspen"

serv params

unysisWeather/getSnowInfo

snowInfo

consist.level

resort

descr

name

"Aspen"

freq

1day

Dynamic XML documents with distribution and replication 16

Replication of dynamic documents

Avoiding stale data: propagating the changes replicate data needed for the service call, and (adjust) the query

LocalAgency

sc

"Aspen"

serv params

unysisWeather/getSnowInfo

snowInfo

consist.level

resort

descr

name

"Aspen"

freq

1day

replicateskiPortal Ski portal

sc

"Aspen"

serv params

unysisWeather/getSnowInfo

snowInfo

consist.level

resort

descr

name

"Aspen"

freq

1day

Dynamic XML documents with distribution and replication 17

Replication of dynamic documents

Providing complete independence to the LocalAgency peer: Replicate also the service definition and the data necessary in

order to make the call (adjust the service definition !)

LocalAgencySki portal

descr sc

"Aspen"

serv params

unysisWeather/getSnowInfo

snowInfo

consist.level

resort

name

"Aspen"

freq

1day

replicate

uniSysWeather

getSnowInfo getSnowInfo

replicate

replicate

Dynamic XML documents with distribution and replication 18

Replication issues

Issues:Which data / services to replicate ?How to adjust the definition of the

replicated services ?

Contributions: algorithms and cost model

Dynamic XML documents with distribution and replication 19

Plan

Replication of dynamic data Distribution of dynamic data Querying distributed and replicated data Cost model Self-configurable replication

Dynamic XML documents with distribution and replication 20

Distribution of dynamic documents

Not all data can be replicated everywhere (space, copyright etc.) Data or service calls may be distributed among several peers.

snowInfo

consist.level

skiPortal

resort

sc

"Aspen"

serviceparams

unysisWeather/getSnowInfo

descr

name

"Aspen"

skiGear

skis

LocalAgencySki portal

Dynamic XML documents with distribution and replication 21

Distribution issues

Issues The system must choose the peers contributing to

query answering The user may want to restrict to just some peers

Contributions XQuery extension XQueryDR

Cost model Distributed P2P query processing algo

Dynamic XML documents with distribution and replication 22

Plan

Replication of dynamic data Distribution of dynamic data Querying distributed and replicated data Cost model Self-configurable replication

Dynamic XML documents with distribution and replication 23

Doc("skiPortal.xml")//resort[descr/name="Aspen"]/snowInfo The query must be split over the peers

Querying distributed dynamic data

snowInfo

consist.level

skiPortal

resort

sc

"Aspen"

serviceparams

unysisWeather/getSnowInfo

descr

name

"Aspen"

skiGear

skis

LocalAgencySki portal

Dynamic XML documents with distribution and replication 24

Location aware model and query language

Universally unique IDs Enhance path expressions, specifying versions to query, e.g.:

{Doc("skiPortal.xml")/resort//hotel}@ANY; {Doc("skiPortal.xml")/resort}@MASTER{//hotel}@LOCAL

snowInfo

consist.level

skiPortal

resort

descr

name

"Aspen"

skiGear

skis

LocalAgencySki portal

hotel

name

"Star"

hotel

name

"Martin's"

hotel

name

"Martin's"

Dynamic XML documents with distribution and replication 25

Query processing (1/2)

Query at skiPortal: {Doc("skiPortal.xml")/resort//hotel}@ANY; skiPortal identifies local fragment: Doc("skiPortal.xml")/resort The remaining query //hotel may be evaluated at:

SkiPortal, or LocalAgency, or HotelSummary Choice based on cost

skiPortal

resort

descr

name

"Aspen"

skiGear

skis

LocalAgencySki portal

hotel

name

"Martin's"

hotel

name

"Star"

hotel

name

"Martin's"

HotelSummary

hotel

name

"Star"

hotel

name

"Martin's"

Dynamic XML documents with distribution and replication 26

Query processing (2/2)

Ski portal may delegate //hotel to LocalAgency

skiPortal

resort

descr

name

"Aspen"

skiGear

skis

LocalAgencySki portal

hotel

name

"Martin's"

hotel

name

"Star"

hotel

name

"Martin's"

TouristGuide

ratings

hotel

****

restaurant

lousy

pool

great

… which in turn may delegate to TouristGuide

Each peer makes the choiceminimizing its own cost

Dynamic XML documents with distribution and replication 27

Plan

Replication of dynamic data Distribution of dynamic data Querying distributed and replicated data Cost model Self-configurable replication

Dynamic XML documents with distribution and replication 28

Cost model

Principles: 1.unify service call activations and queries 2. cut queries/services into 1-peer pieces

Cost of (peer, query) pair (pi, qi) from the perspective of a peer pk

Cost function includes Objective parameters: real costs (network, storage etc.)

at all peers • Many of these are not known to pk

Subjective parameters: weights of objective costs from the perspective of pk

Dynamic XML documents with distribution and replication 29

Cost Model:Observable performance

skiPortal

On each peer Service calls in local documents Queries are asked Local services can be called

from outside

Cumulated daily costs = peer's observable performance

May be changed through replication

……

Dynamic XML documents with distribution and replication 30

Plan

Replication of dynamic data Distribution of dynamic data Querying distributed and replicated data Cost model Self-configurable replication

Dynamic XML documents with distribution and replication 31

Self-configurable replication

Improving performances of a given query:Replicate some data (including service calls)

• Replicate the service definitions and the data used by them

Recursively

Details of Cost-based algorithm in the paper

Dynamic XML documents with distribution and replication 32

Implementation status

AXML environment: SUN’s Java JDK 1.4 (includes XML parser, XPath

processor, XSLT engine) Apache Tomcat 4.0 servlet engine Apache Axis SOAP toolkit 1.0 beta 3 X-OQL query processor, persistent DOM repository JSP-based user interface, using JSLT 1.0 standard tag

library AXML D&R-specific (ongoing):

A collaborative distributed workspace demo (VLDB03) A light AXML peer for CLDC devices (kXML-RPC, XPath

proxy) AXML is online:

http://www-rocq.inria.fr/verso/Gemo/Projects/axml/

Dynamic XML documents with distribution and replication 33

Conclusion & future work

Generic framework for peer-to-peer management of dynamic documentsFocus on dynamic aspectsContext: AXML

Work ongoing Optimal query splitCollaborative index construction

Grazie

Dynamic XML documents with distribution and replication 35

References

(if people ask) [ABM01] S.Abiteboul, O.Benjelloun, T.Milo. "", FMII workshop,

2001. [ABM+02] S.Abiteboul, O.Benjelloun, I.Manolescu, T.Milo and

R.Weber "", in "Web Dynamics", 2003. [MAA+03] T.Milo, S.Abiteboul, B.Amann, O.Benjelloun and

F.Dang Ngoc. "Exchanging Intensional XML Documents", SIGMOD 2003

[XLink] http://www.w3c.org/xlink [LDAP] O.Kapitskaia, R. Ng and D. Srivastava. “Evolution and

revolutions in LDAP directory caches”, EDBT, 2000.

Dynamic XML documents with distribution and replication 36

Related work

LDAP, XLink, query/data shipping