9/20/2006 ICWS 2006 1
Exploring Remote Object Coherence in XML Web Services
Robert van Engelen and Wei Zhang
Florida State University
Madhusudhan Govindaraju
State University of New Yorkat Binghamton
ICWS 2006 2 of 489/20/2006
Outline
Motivation “O/X” impedance mismatch Object-level coherence requirements Mapping programming language types to XML Static-dynamic algorithms for XML (de)serialization Results and conclusions
ICWS 2006 3 of 489/20/2006
Motivation
Can object-level coherence be achieved to ensure data structures and object graphs preserve their
states and structure when moved in XML (de)serialization format?
ICWS 2006 4 of 489/20/2006
O/X Impedance Mismatch(1)
Inability to support derivation by restriction in programming languages Such as restricted value and patterns
Not being able to map XML names to identifiers in all cases
Unsupported XML schema components Unsupported types No naming mechanism to implement XML namespaces Serializing object graphs
ICWS 2006 5 of 489/20/2006
Namespaces vs Packages XML namespace concept can not be translated to c++
namespaces or Java packages
Using namespace a;Class X { Y_TYPE: y; What to be here for namespace b? …}
C++ class:<complexType name=“X”>
<sequence> <element name=“a:y” type=“a:Y_TYPE”/> <element ref=“b:x” /> </sequence> …
</complexType>
Schema snippet:
ICWS 2006 6 of 489/20/2006
Serializing a graph of object requires object-level coherence to
ensure application-level interoperability
ICWS 2006 7 of 489/20/2006
XML Web Services
Objectgraph x
Send request
Objectgraph
XML
XML Serialization XML Deserialization
Objectgraph y
Receive request
Objectgraph x’
Receive response
Objectgraph
XML
XML Deserialization XML Serialization
Objectgraph y’
Send response
Server:transform y
into y’
Client Server
ICWS 2006 8 of 489/20/2006
Web Service Tools:Lossy Serialization in XML
Objectgraph x
Internal binary format X
Objectgraph y
XML
XML Serialization
XML Deserialization
Serialization in XML can be lossy, because object graph x cannot be recoveredfrom its serialized XML form if we are not careful about the choice of mapping
ICWS 2006 9 of 489/20/2006
Basic Requirement:Object Graph Isomorphism
Objectgraph x
Representation X
Objectgraph y
Representation Y
Mapping M
Inverse mapping M -1
Object graphs x and y are isomorph if M is bijective and has an inverse M-1
Object-level coherence requires bijective mappings, so that distributedand serialized object graphs are always isomorph
ICWS 2006 10 of 489/20/2006
Object Coherence Requires Lossless XML Serialization
Objectgraph x
Internal binary format X
Objectgraph y
XML
Serialization method M
Deserialization method M -1
Serialization is lossless if an object graph x can be recovered from itsserialized XML form y = M(x) by deserializing x = M-1(y)
We consider structural equivalence only (location in memory is irrelevant)e.g. as in JavaRMI, IIOP, and DCOM
ICWS 2006 11 of 489/20/2006
Static Structure Analysis forSOAP RPC Encoding
Structure
TREE DAG DAG
XML Schema
<complexType name=“X”> <sequence> <element name=“y” type=“tns:Y”/> <element name=“z” type=“tns:Z”/> </sequence> …</complexType>
<complexType name=“X”> <sequence>…</sequence></complexType><complexType name=“Y”> <sequence>…</sequence></complexType
<complexType name=“X”> <sequence>…</sequence></complexType
Example XML
Instance
<x> <y>…</y> <z>…</z> …</x>
<x> <y href=“123”/> … <y href=“123”/> …</x>…<mref id=“123”>…</mref>
<x> <x href=“456”/> … <x href=“456”/> …</x>…<mref id=“456”>…</mref>
X
Y Z
X
Y
X X
X
X
ICWS 2006 12 of 489/20/2006
Static Structure Analysis for Document/Literal Encoding
Structure
DAG DAG DAG
XML Schema
<complexType name=“X”> <sequence> <element name=“y” type=“tns:Y”/> <element name=“z” type=“tns:Z”/> </sequence> …</complexType>
<complexType name=“X”> <sequence>…</sequence> <attribute name=“ref” type=“REF”/></complexType><complexType name=“Y”> <sequence>…</sequence> <attribute name=“id” type=“ID”/></complexType
<complexType name=“X”> <sequence>…</sequence> <attribute name=“id” type=“ID”/> <attribute name=“ref” type=“REF”/></complexType
Example XML
Instance
<x> <y>…</y> <z>…</z> …</x>
<x ref=“123”>…</x>…<x ref=“123”>…</x>…<y id=“123”>…</y>
<x ref=“456”>…</x>…<x ref=“456”>…</x>…<x id=“456”>…</x>
X
Y Z
X
Y
X X
X
X
ICWS 2006 13 of 489/20/2006
Mapping Programming language types to XML(1)
Problems with RPC encoding Multi-ref serialization with href and id attributes
violates XML schema validation constraints Serialization of nil references, multi-referenced
objects, and (sparse) multi-dimensional arrays are not precisely defined
Multiple structures may map to same XML
ICWS 2006 14 of 489/20/2006
Mapping Programming language types to XML(2)
Mapping SOAP RPC encoding style requirements Mapping primitive types to built-in primitive XSD types
and vice versa Mapping records (structs, classes) to xs:complexType Mapping arrays to SOAP arrays by ensuring support for
SOAP 1.1 partial and sparse arrays Object graph must be serialized with multi-reference
encoding Choosing a naming mechanism for XML namespaces
resolution to avoid name clashes prefix_name
ICWS 2006 15 of 489/20/2006
Mapping Programming language types to XML(3)
Mapping SOAP Document/Literal encoding style requirements Mapping XML attribute definitions within an xs:complexType
Support for repetitions of xs:element in xs:complexType sequences
Support for use of xs:choice and xs:any Support for xs:group and xs:attributeGroup Support for substitution group Support for top-level element, attribute definitions and
references Support for mixed contents
ICWS 2006 16 of 489/20/2006
gSOAP project:A static-dynamic approach to
XML (de)serialization
ICWS 2006 17 of 489/20/2006
Static Structure Analysis forCompiled XML Serialization
Construct a data model Compiler determines which
data type instances can refer to other data type instances at run time
Generate type-specific points-to analysis algorithm Only needed for pointer types
and structs/classes with pointer fields
Generate type-specific optimized serialization code Serializer uses id-ref linking
based on pointer analysis
Points-toanalysis
Optimizedserializer
Compiler
Source codetype definitions
generates generates
Ptr hash table(graph edges)
ICWS 2006 18 of 489/20/2006
Constructing a Plausible Data Model from Source Code
typedef int SSN;
struct Node{ int val; int *ptr; float num; struct Node *next;};
val ptr num next
Node
SSN
Data model graph(arcs denote all possible
pointer references)
Source code typedefinitions
ICWS 2006 19 of 489/20/2006
Generating an XML Schema Definition for Serialized XML
val ptr num next
Node
SSN
<simpleType name=“SSN”> <restriction base=“int”/></simpleType>
<complexType name=“Node”> <sequence> <element name=“val” type=“int”/> <element name=“ptr” type=“int” minOccurs=“0”/> <element name=“num” type=“float”/> <element name=“next” type=“tns:Node” minOccurs=“0”/> </sequence></complexType>Data model graph
ICWS 2006 20 of 489/20/2006
Generating Code for Runtime Points-To Analysis
serialize_pointerToint(int *p){ if (p != NULL) { // lookup and mark (p,TYPE_int) // as target in ptr hash table }}
serialize_Node(struct Node *p){ if (p != NULL) { // lookup and mark (p,TYPE_Node) // as target in ptr hash table mark_embedded(&p->val,TYPE_int); serialize_pointerToint(p->ptr); // skip p->num serialize_pointerToNode(p->next); }}
val ptr num next
Node
SSN
Data model graph
ICWS 2006 21 of 489/20/2006
Generating Serialization Codeput_pointerToint(int *p){ if (p != NULL) { // lookup (p,TYPE_int) // if embedded, then output “ref” // if single, then output value // if multi, then output value // with “id” and mark embedded }}
put_Node(struct Node *p){ if (p != NULL) { // lookup (p,TYPE_Node) // if embedded, then output “ref” // if single, then output value // if multi, then output value // with “id” and mark embedded put_int(&p->val); put_pointerToint(p->ptr); put_float(&p->num) put_pointerToNode(p->next); }}
val ptr num next
Node
SSN
ICWS 2006 22 of 489/20/2006
Serialization Example
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
ID PtrLoc PtrType PtrSize PtrCount RefType
1 A Node 16 2 multi
2 B int 4 1 embedded
3 B Node 16 1 single
4 C int 4 1 single
Ptr hash table after runtime points-to analysis by thegenerated algorithm constructed from the data model
ICWS 2006 23 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=Bembedded, id=2
multi, id=1
single
single
multi, id=1
ICWS 2006 24 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> </Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
ICWS 2006 25 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> </Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
ICWS 2006 26 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> </Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
ICWS 2006 27 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> </Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
ICWS 2006 28 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next>
</next> </Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
ICWS 2006 29 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> </next>
</Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
ICWS 2006 30 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr> </next>
</Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
ICWS 2006 31 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr> <num>2.3</num> </next>
</Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
ICWS 2006 32 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr> <num>2.3</num> <next ref=“#_1”/> </next>
</Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
ICWS 2006 33 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Nodeloc=A
SSNloc=C
val=456
ptr=C
num=2.3
next=A
Nodeloc=B
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr> <num>2.3</num> <next ref=“#_1”/> </next></Node>
embedded, id=2
multi, id=1
single
single
multi, id=1
ICWS 2006 34 of 489/20/2006
Compiled XML Deserialization
For each data type, generate optimized deserialization code:
Generate a recursive descent LL(1) XML parser that is optimized for the data type
Embed deserialization operations as semantic actions in the parser
Invoke id hash table lookups as semantic actions to resolve id-ref references on the fly
The pointer remapper ensures the consistency of pointers in the object graph when nodes are moved in the construction process
Pointerremapper
Id hash table(resolve id-ref)
LL(1) parser & deserializer
Compiler
Source codetype definitions
generates
ICWS 2006 35 of 489/20/2006
Deserialization Example<Node id=“_1”> </Node>
val=?
ptr=?
num=?
next=?
Node
id=1
Recursive descent parsingwith object deserializationoperations as semantic actions
ICWS 2006 36 of 489/20/2006
Example (cont’d)
val=123
ptr=?
num=?
next=?
Node
<Node id=“_1”> <val>123</val> </Node>
id=1
Recursive descent parsingwith object deserializationoperations as semantic actions
ICWS 2006 37 of 489/20/2006
Example (cont’d)
val=123
ptr=?
num=?
next=?
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> </Node>
id=1
ref=2(unresolved)
Recursive descent parsingwith object deserializationoperations as semantic actions
ICWS 2006 38 of 489/20/2006
Example (cont’d)
val=123
ptr=?
num=1.4
next=?
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num>
</Node>
id=1
ref=2(unresolved)
Recursive descent parsingwith object deserializationoperations as semantic actions
ICWS 2006 39 of 489/20/2006
Example (cont’d)
val=123
ptr=?
num=1.4
next=B
Node
val=?
ptr=?
num=?
next=?
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> </next></Node>
id=1
ref=2(unresolved)
Recursive descent parsingwith object deserializationoperations as semantic actions
ICWS 2006 40 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
Node
val=456
ptr=?
num=?
next=?
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> </next></Node>
id=1
id=2
Recursive descent parsingwith object deserializationoperations as semantic actions
ICWS 2006 41 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Node
SSN
val=456
ptr=C
num=?
next=?
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr>
</next></Node>
id=1
id=2
Recursive descent parsingwith object deserializationoperations as semantic actions
ICWS 2006 42 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Node
SSN
val=456
ptr=C
num=2.3
next=?
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr> <num>2.3</num>
</next></Node>
id=1
id=2
Recursive descent parsingwith object deserializationoperations as semantic actions
ICWS 2006 43 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Node
SSN
val=456
ptr=C
num=2.3
next=A
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr> <num>2.3</num> <next ref=“#_1”/> </next></Node>
id=1
Recursive descent parsingwith object deserializationoperations as semantic actions
ICWS 2006 44 of 489/20/2006
Example (cont’d)
val=123
ptr=B
num=1.4
next=B
=789
Node
SSN
val=456
ptr=C
num=2.3
next=A
Node
<Node id=“_1”> <val>123</val> <ptr ref=“#_2”/> <num>1.4</num> <next> <val id=“_2”>456</val> <ptr>789</ptr> <num>2.3</num> <next ref=“#_1”/> </next></Node>
id=1
Recursive descent parsingwith object deserializationoperations as semantic actions
ICWS 2006 45 of 489/20/2006
Performance Results
Performance results in number of roundtrip messagesper second of a benchmark client/server application
on various platforms.
Measured onroundtrip messagecontaining a struct
of size 1.2KB
ICWS 2006 46 of 489/20/2006
Performance Results(2)
End-to-end performance (in milliseconds) of a gSOAP benchmark application with and without multi-ref encoding. The data show that
gSOAP’s multi-ref implementation adds just 2% to 3% overhead, which is minimal.
Measured on echoStringWith various array sizes
ICWS 2006 47 of 489/20/2006
Performance Comparison to Other XML Parsers
Experimental parsers
19 13 18 115 5 3 3
50 50
228131
87
0
50
100
150
200
250
300
350
TDX TDX -Cfa DFA DFA -Cfa eXpat gSOAP Xerces
Pars
e T
ime(u
s)
validation
decoding+validation
parsing+scanning
parsing+validation
scanning
Measured on echoStringOf size 1KB
ICWS 2006 48 of 489/20/2006
Conclusions
Object-level coherence in XML Web Service can be achieved with specialized algorithms
Design space of mapping issues is large Algorithms to (de)serialize non-primitive data
structures ensuring object-level coherence and performance guarantees Work best with SOAP 1.2 RPC encoding For SOAP document/literal style, we suggest use of id-ref
ICWS 2006 49 of 489/20/2006
Thank You
ICWS 2006 50 of 489/20/2006
Implementation: the gSOAP Toolkit
Web service applications are build in two stages:1. Generate service
definitions in familiar C/C++ header file format
2. Generate client/server stubs and skeleton source code and serialization code
Note: the soapcpp2 compiler can also be used to generate WSDL from header files
soapClient.cppsoapServer.cppsoapC.cpp
wsdl2h tool
soapcpp2compiler
Service defs:service.wsdl
Header file defs:service.h
ICWS 2006 51 of 489/20/2006
1. Analyze WSDL/Schema and Generate C/C++ Header File
Header file defs:service.h
wsdl2h tool
Documented methods and class diagrams
Service defs:service.wsdl
ICWS 2006 52 of 489/20/2006
2. Analyze Header File and Generate Stubs and Serializers
stats
Network fabric
gSOAP engineHTTP/S + TCP/IP + UDP
schema-optserializer
schema-optdeserializer
LL(1) XML parsersand (de)serializers
client/server API
gSOAP application
logging
WSSE
Plugins
Native C/C++ data
XML data
soapcpp2compiler
soapClient.cppsoapServer.cppsoapC.cpp
Header file defs:service.h