• Published on

  • View

  • Download


EXCHANGING INTENSIONAL XML DATA. Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ; Bernd Amann Cedric-CNAM ; Omar Benjelloun INRIA ; Fred Dang Ngoc INRIA. H. GL ALIKLI 2002700743 MURAT KORA 2002700797. INTRODUCTION. - PowerPoint PPT Presentation


<ul><li><p>EXCHANGING INTENSIONAL XML DATATova Milo INRIA &amp; Tel-Aviv U. ; Serge Abiteboul INRIA ;Bernd Amann Cedric-CNAM ; Omar Benjelloun INRIA ;Fred Dang Ngoc INRIAH. GL ALIKLI 2002700743</p><p>MURAT KORA 2002700797</p></li><li><p>INTRODUCTIONEmergence of Web Services as standard means of publishing and accessing data on the web introduced a new class of XML documents called intensional documents.</p><p>Intensional Documents:XML documents where;some of the documents are defined explicitlysome are defined by programs that generate data.</p></li><li><p>INTRODUCTIONmaterialisation: the process of evaluating some of the programs included in an XML document and replacing them by their results.</p><p>GOAL of this PAPER:Study the new issues raised by the exchange of intensional XML document btw. ApplicationsDecide on which data should be materialised before it is sent and which should not </p></li><li><p>INTRODUCTIONCONSIDERATIONS for MATERIALISATIONPerformance: current system loadcost of communicationCapabilities:unability to handle intensional parts of a documentlack of access rights (to a particular service)Security:invoking service calls from an untrusted party may cause severe security violations Functionalities: confidentiality reasonscalling services may involve fees to be paid.</p></li><li><p>INTRODUCTIONData Exchange Schemagqffqg...rq............Data exchange scenario for intensional documents</p></li><li><p>THE MODEL and THE PROBLEMSIMPLE INTENSIONAL XML:Model intentional XML documents as Labelled Trees consisting of two types of nodes:Data nodes Function Nodes correspond to Service Calls</p><p>Assume the existance of some Disjoint Domains:N : domain of NODESL : domain of LABELSF : domain of FUNCTION NAMESD : domain of DATA VALUES</p></li><li>THE MODEL and THE PROBLEMSIMPLE INTENSIONAL XML (contd)DEFINITION 1: An intensional document d is an expression (T,) where:T=(N,E,</li><li><p>THE MODEL and THE PROBLEMSIMPLE INTENSIONAL XML (contd)Nodes with a label in L U D are called Data Nodes.Nodes with a label in F are called Function Nodes.The children subtrees of a function node are the Function ParametersWhen the function is called;These subtrees are passed to itThe return value replaces the function node in the document.</p></li><li><p>newspapertitleThe Sundate04/10/2002Get_TempcityParisTimeOutExhibitstemp16 CTHE MODEL and THE PROBLEM</p></li><li><p>SIMPLE SCHEMA:DEFINITION 2: A document schema s is an expression (L,F,) where,L L :finite set of labelsF F :finite set of function names :function that maps:Each label name l L to a regular expression over L U F or to the keyword dataEach function name f F to a pair of expressions calledin(f ) input type of f out(f ) output type of f </p><p>THE MODEL and THE PROBLEM</p></li><li><p>SIMPLE SCHEMA (contd)</p><p>Example of a Schema:data: (newspaper) =title.date.(Get_Temp|temp) .(TimeOut|exhibit) (title) = data (date) = data (temp) = data (city) = data (exhibit) = data</p><p>THE MODEL and THE PROBLEM</p></li><li><p>SIMPLE SCHEMA (contd)</p><p>Example of a Schema (contd):functions:in (Get_Temp)= cityout (Get_Temp)= tempin (TimeOut)= dataout (Timeout)= (exhibit|performance)in (Get_Date)= titlein (Get_Date)= date</p><p>THE MODEL and THE PROBLEM</p></li><li><p>THE MODEL and THE PROBLEMSIMPLE SCHEMA (contd):DEFINITION 3: An intensional document t is instance of a schema s=(L,F,) if for each:Data Node n t with label l L, the labels of ns children form a word in lang((l ))</p><p> Same is valid for Function Node.</p><p>Used to denode the regular language defined by (l )</p></li><li><p>THE MODEL and THE PROBLEMSIMPLE SCHEMA (contd):DEFINITION 3 (contd): f : a function name t1,......,tn : a sequence of intensional trees IF the labels of ns children form a word in lang(in(f)) (lang(out(f)) )ANDall the trees are instances of s.THENt1,......,tn is an input instance of f (output instance)every subtree conforms to the same schema as the whole document</p></li><li><p>THE MODEL and THE PROBLEMSIMPLE SCHEMA (contd):DEFINITION 4: (about Rewritings)t,t: treesIF t is obtained from t by;selecting a function node v in t with some label f andreplacing it by an arbitrary output instance of f THEN we say that t t </p><p>v</p></li><li><p>THE MODEL and THE PROBLEMSIMPLE SCHEMA (contd):DEFINITION 4: (about Rewritings) (contd)</p><p>IF t t1 t2 ------ tn THEN</p><p>we say that t tn nodes v1,........, vn are called rewriting sequencethe set of all trees t such that t t is denoted ext(t).</p><p>v1v2vn*t rewrites into tn*</p></li><li><p>THE MODEL and THE PROBLEMSIMPLE SCHEMA (contd):DEFINITION 5: (about Rewritings) Let:t be a trees be a schema1. IF ext(t) contains some instance of s THEN t possibly rewrites into s.2. IF either t is already an instance of s or there exists some node v in t such that all trees t where t t safely rewrite into s THEN we say that t safely rewrites into s</p><p>v</p></li><li><p>THE MODEL and THE PROBLEMSIMPLE SCHEMA (contd):DEFINITION 6:Let:s be a schemar is a distinguished label called root labelIF all the instances t of s with root label r rewrite safely into instances of s THENwe say that: s safely rewrites into s</p></li><li><p>THE MODEL and THE PROBLEMA Richer Data Model : Function Patterns:</p><p>The schemas we have seen so far specify that a particular function, identified by its name, may appear in the document.</p><p>But sometimes, one does not know in advance which functions will be used at a given place.</p><p> A common intensional schema for such documents should not require the use of a particular function, but rather allow for a set of functions, which have a proper signature. </p></li><li><p>THE MODEL and THE PROBLEMto specify such set of functions we use Function PatternsFunction Patterns:A function belongs to the pattern if its name satisfies the boolean predicate and its signature is the same as the required one EX: name (Forecast)= UDDIF InACL in (Forecast)= cityout (Forecast)= temp</p><p>V</p></li><li><p>THE MODEL and THE PROBLEMA Richer Data Model (contd): Restricted Service Invocations:We assumed so far that all the functions appearing in a document may be invoked in a rewriting, in order to match a given schema. This is not always the case, for the reasons like; security, cost, access rights , etc. THUS, function names/patterns in the schema can be partitioned into two disjoint groups of invocable and noninvocable ones. A legal rewriting is then one that invokes only invocable functions. </p></li><li><p>EXCHANGING INTENSIONAL DATARewriting Process: 1.Safe Writing:check if t safely rewrites to s if so, find a rewriting sequence.rewriting sequence a sequence of functions that need to be invoked to transform t into the required structure preferred required structure shortest/ cheapest one</p></li><li><p>EXCHANGING INTENSIONAL DATARewriting Process(contd): 2.Possible Writing :IF a safe rewriting does not existcheck whether at least t may rewrite to s.IF it is acceptable to do so (the sender accepts that the rewriting may fail),try to find a successful rewriting sequence if one existspreferred rewriting sequence one with the least cost. </p></li><li><p>EXCHANGING INTENSIONAL DATARewriting Process(contd): 3.Mixed Approached:In mixed approach, one could first invoke some function callsthen attempt from there to find safe rewritings.</p></li><li><p>EXCHANGING INTENSIONAL DATARewriting Process(contd): DEFINITION 7:For a rewriting sequence tv :t1 .. tn ,IF V j ti but V j ti-1 . THEN we say that function node Vj depends on a function node V i .IF the dependency graph among the nodes contains no paths of length greater than k. THEN we say that a rewriting sequence is of depth k </p><p>v1vn</p></li><li><p>EXCHANGING INTENSIONAL DATA</p><p>RESTRICTION:</p><p>Consider only k-depth left-to-right rewritings.</p></li><li><p>SAFE REWRITINGAlgorithm for k-depth left to right safe rewritingAlgorithm is decomposed into three parts:1.Rewriting Function Parameters:to invoke a function its parameters should be of right typeif not they should be rewritten to fit that type.</p><p>when rewriting the parameters;the functions in them can be invoked ONLY IF their own parameters can be rewritten into (i.e. are the expected input type.)</p></li><li><p>SAFE REWRITINGAlgorithm is decomposed into three parts (contd)1.Rewriting Function Parameters (contd)For deepest functionsVerify that their parameters are instances of the corresponding input types.If notrewriting fails.</p><p>Move upward ( do till all functions in the tree(forest) are done)Try to safely rewrite f s own parameters into the required structure.If notrewriting fails.</p></li><li><p>SAFE REWRITINGAlgorithm is decomposed into three parts (contd)2.Top Down Traversal:In each iteration of the recursive procedure Rewriting Function Parameters,the parameters of the outmost functions of tree (forest) are handled. In this part safely rewrite the tree (forest) by invoking only these outmost functions.THUS:traverse the tree (forest) top downAt each step treat a single node and its children.</p></li><li><p>SAFE REWRITINGAlgorithm is decomposed into three parts (contd)2.Top Down Traversal (contd)node n with children whose labels form a word wThe subtree rooted at node n can be rewritten into the target schema s=(L,F,)IF and ONLY IF:1. w can be safely rewritten into a word in lang((label(n)))AND2. each of ns children can be safely rewritten into an instance of s.</p></li><li><p>SAFE REWRITINGAlgorithm is decomposed into three parts (contd)3.Rewriting the children of a node n: Given:w word (sequence of labels of ns children)Goal:rewrite w so that it becomes a word in the regular language R=(label(n))</p><p>The process of rewriting involves:choosing some functions in w and replacing them by a possible outputthen choosing some other functions (which might have been returned by previous calls) and replacing them by their outputand so on up to the depth k</p></li><li><p>SAFE REWRITINGSafe Rewriting Algorithm:Given: word wthe output types Rf1,.....,Rfn of the available functionstarget regular language RPurpose of the algorithm:to test if w can be safely rewritten into a word in Rif so, to find a safe rewriting sequence</p></li><li><p>SAFE REWRITINGSafe Rewriting Algorithm:Note:For illustration purposes we use the newspaper document w=title.date.Get_Temp.TimeOut word children labels form R=title.date.temp (TimeOut|exhibit*) safe rewriting of the above word into the word in RThe Algorithm:1) Build the finite state automata for the following regular languages1.1) An Automaton Aw accepting w as a single word.</p></li><li><p>SAFE REWRITINGThe Algorithm (contd)1.2) Build automata Afi ,i=1,...,n each accepting the regular language Rfi 1.3) Build an automaton A accepting the complement of the regular language R . The automaton should be deterministic and complete.</p></li><li><p>SAFE REWRITINGThe complement automation A for schema (newspaper)=title.temp(TimeOut|exhibit*)</p><p>p5p3p3p4p6tempTimeOutexhibitexhibit*****p1datep0title*</p></li><li><p>SAFE REWRITINGThe Algorithm (contd)2) Let Aw := Aw3) For j=1,...,kConsider all the edges e=(v,u) in Aw that are labelled by the function name fi and not iterated in previous iterations3.1) extend Aw by attaching a copy of the automaton Afi with its inital and final states linked to v and u respectively by moves.3.2) denote v as a fork node (for the edge e)3.3) two fork options of v are e itself and the new outgoing edgekkk</p></li><li><p>SAFE REWRITING1 depth automaton Aw for the word w=title.date.Get_Temp.TimeOut </p><p>1Fork nodeFork nodeRepresents choice of invoking the function Represents choice of not invoking the function</p></li><li><p>SAFE REWRITINGThe Algorithm (contd)4) Construct the cartesian product automatonAX=Aw X A The fork nodes and fork options in AX reflect those of Aw :4.1) the fork nodes [q p] AX nodes where q was a fork node in Aw 4.2) a fork option in AX consists of all edges originating from one fork option edge in Aw.</p><p>kkkk</p></li><li><p>SAFE REWRITINGThe cartesian product automaton Ax = Aw x A</p><p>q0,p0q3,p6q1,p1q2,p2q3,p3q5,p2q6,p3q4,p4q7,p3q4,p3q7,p5q5,p5q7,p6q4,p6q7,p6titledateGet_TemptempTimeOutPerform.exhibitPerformanceexhibitTimeOutExhibitPerformanceFigure6:</p></li><li><p>SAFE REWRITINGThe Algorithm (contd):5) Mark nodes in AX :5.1) mark states that are accepting states in both Aw and A 5.2) iteratively mark;nonfork (regular) nodes: IF one of their outgoing edges points to a marked node fork nodes: IF both of their fork options (for some fi ) contain an edge that points to a marked node.</p><p>k</p></li><li><p>SAFE REWRITINGThe cartesian product automaton Ax = Aw x A</p><p>q0,p0q3,p6q1,p1q2,p2q3,p3q5,p2q6,p3q4,p4q7,p3q4,p3q7,p5q5,p5q7,p6q4,p6q7,p6titledateGet_TemptempTimeOutPerform.exhibitPerformanceexhibitTimeOutExhibitPerformanceFigure6:</p></li><li><p>SAFE REWRITINGThe Algorithm (contd):6) Try to obtain a SAFE REWRITING.A safe rewriting exists IFF the initial state is not marked6.1) Follow a non-marked path (corresponding to w ) starting from the initial state of Ax to a state [q p] where q is an accepting state of Aw 6.1.1) non-marked fork options on the path determine the rewriring choices (i.e. which functions to call)6.1.2)when a function is invoked, we cont,nue the path with the new rewritten word rather than the word w </p><p>k</p></li><li><p>SAFE REWRITINGThe Algorithm (contd):6.2) To minimize the rewriting cost, choose a path with minimal number/cost of function invocations.EXIT % End of the algorithm</p></li><li><p>SAFE REWRITINGThe complement automaton A for schema (newspaper)=title.date.temp.exhibit*</p><p>p5q3p3p4p6temp*exhibitexhibit*****q1dateq0title*1Figure7:</p></li><li><p>SAFE REWRITINGThe cartesian product automaton Ax = Aw x A</p><p>q0,p0q3,p6q1,p1q2,p2q3,p3q5,p2q6,p3q7,p3q4,p3q7,p5q5,p5q7,p6q4,p6q7,p6titledateGet_TemptempTimeOutPerform.exhibitPerformanceexhibitTimeOutExhibitPerformance111Figure8:</p></li><li><p>SAFE REWRITINGComplexity of the Algorithm:s0 schema of the senders agreed data exchange schemaComplexity is determined by the size of the cartesian product of the automaton. 1. Construct the cartesian product2. Traverse and mark the nodes of the resulting productTHUS complexity is bounded by:O(|Ax| )=O( ( | Aw | X | A |) )</p><p>22k</p></li><li><p>SAFE REWRITINGComplexity of the Algorithm: (contd)</p><p>O(|Ax| )=O( ( | Aw | X | A |) )</p><p>22kMaximum size:O((|s0|+|w|) )kComplexity is polynomialin the size of schemas s and s0 (with the exponent determined by k)</p></li><li><p>POSSIBLE REWRITINGThe Algorithm1. Build finite state automaton for the following languages:1.1. An automaton Aw1.2. An automaton accepting the regular language R</p><p>k</p></li><li><p>POSSIBLE REWRITINGAn automaton A for schema (newspaper)=title.date. Temp.exhibit*</p><p>p2p3p4tempExhibitexhibitp1datep0titleFigure10:</p></li><li><p>POSSIBLE REWRITINGThe Algorithm (contd)2.Construct the cartesian product automaton Ax=Aw x A</p><p>q0,p0q1,p1q2,p2q3,p3q5,p2q6,p3q7,p3titledatetempFigure11:q4,p3q4,p4q7,p4exhibitk</p></li><li><p>POSSIBLE REWRITINGThe Algorithm (contd)3.Mark all nodes in Ax having some outgoing path leading to a final state4.IF the initial state is marked THEN a rewriting may exist.To obtain such a rewriting: Follow a marked path from the initial state of Ax to a final one , with the fork options on the path determining the rewriting choices. Backtrack when the call return a value that does not allow to continue to an accepting state To minimize thE rewriting cost, choose a path with the minimal number/cost of function invocations.</p></li><li><p>SAFE REWRITINGThe cartesian product automaton for possible rewritting.</p><p>q0,p0q1,p1q2,p2q3,p3q5,p2q6,p3q7,p3titledatetempFigure11:q4,p3q4,p4q7,p4exhibit</p></li><li><p>IMPLEMENTATIONimplementation performed in the Schema Enforcement Mo...</p></li></ul>


View more >