View
226
Download
6
Embed Size (px)
Citation preview
XJ: Facilitating XML Processing in Java
Written By :
Matthew Harren Mukund Raghavachari
Oded Shmueli Michael Burke
Rajesh Bordawekar Igor Pechtchanski
Vivek Sarke
Conference: The 14th International World Wide Web Conference (WWW2005), Chiba, Japan, May 10-
14, 2005
Karawan Shahla
Seminar Lecture 236803
Agenda
Some files.Main Idea.Introduction to XJ.XJ Type System.XJ Expressions .XJ Updates.XJ Problems.Conclusion
<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element name="course" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="points" type="xs:int"/> <xs:element name="number" type="xs:int"/> <xs:element name="name" type="xs:string"/> <xs:element name="teacher" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element></xs:schema>
Schema file (file: technioncatalog.xsd)
XML document(file: short.xml)
<?xml version="1.0" encoding="UTF-8"?><catalog> <course> <points>3</points> <number>234319</number> <name>Programming Languages</name> <teacher>Ron Pinter</teacher> </course> <course> <points>3</points> <number>234141</number> <name>Combinatorics for CS</name> <teacher>Ran El-Yaniv</teacher> </course></catalog>
XJ Program file
import java.io.*; import technioncatalog.*; public class Demo1 { public static void main(String[] args) throws Throwable { catalog cat = new catalog(new(File("short.xml")); catalog.course c = cat [| /course[2] |]; printCourse(c); } private static void printCourse(catalog.course c) { String name = c [| /name |]; String teacher = c [| /teacher |]; int points = c [| /points |]; int id = c [| /number |]; System.out.println(name + "(" + id + ") by " + teacher + ", " + points); } }
“Combinatorics for CS (234141) by Ran El-Yaniv, 3 credit points”
Main Idea
XML is getting increasingly popular.
High level languages should support manipulating XML sufficiently.
Let’s go through existing API’s
public static void main(String[] args) throws Throwable { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new java.io.File("short.xml")); XPath xp = XPathFactory.newInstance().newXPath(); DTMNodeList nodes = (DTMNodeList) xp.evaluate("//course", doc, XPathConstants.NODESET); printCourse(nodes.item(1)); } XPath is a plain string. It may
be:•Syntactically incorrect•Incompatible with the document
The types of the XML objects
(Node, Document) do not reflect the
schema
Traditional XML processing: (DOM, XPath apis)
private static void printCourse(Node n) { NodeList nodes = n.getChildNodes(); System.out.println(nodes.item(5).getTextContent() + " (" + nodes.item(3).getTextContent() + ") by " + nodes.item(7).getTextContent() + ", " + nodes.item(1).getTextContent() + " credit points");} Assumption: Four
child nodes must exist
Assumption: 3rd child is the course number
• These assumptions will not hold if the schema is changed– => run-time errors– problems remain, even if we identify nodes by name
• Possible Schema changes:– Allowing a new optional <students> sub-element– Changing the order of the sub-elements
What about reading the numeric value of an
element?
Traditional XML processing(DOM apis)
Assumption: 2nd child has no child elements
Shaping the future
• What XML-related facilities do we want?– Typed XML objects – Seamless translation of a Schema/DTD into a Java
type – Two composition techniques
• XML notation
• Java’s object creation syntax – Two decomposition techniques
• Typed XPath • Typed, named methods/fields
– XPath expressions as first-class-values
XJ: offered solution
Java XJ.
we will over view the constructs offered by XJ.
Available at: http://www.research.ibm.com/xj
XJ Type System
Integration with Schema
• The rationale: 1. An OO program is a collection of class definitions2. A Schema file is a collection of type definitions
• => let’s integrate these definitions
• Any Schema is also an XJ types– The XJ compiler generates a “logical class” for
each such type– Schema file == package name– Using a schema == import schema_file_name;
import technioncatalog.*;
public class Demo2 { public static void main(String[] args) throws Throwable { String x = "Algorithms 1"; int y = 234247; catalog cat = buildCatalog(new catalog.course( <course><points>3</points>
<number>{y}</number><name>{x}</name> <teacher>Shlomo Moran</teacher></course>)); }
private static catalog buildCatalog(catalog.course c) {
return new catalog(<catalog>{c}</catalog>); } }
XML literal in XJ code• Invalid XML content triggers a compile-time error• Resulting elements are typed!
... course c = new course(<course> <teacher>Shlomo Moran</teacher></course>); buildCatalog(c);
XMLObject x = new course.teacher ( <teacher>Shlomo Moran</teacher>); buildCatalog(x);...
private static catalog buildCatalog(catalog.course c) { return new catalog(<catalog>{c}</catalog>); }
An ill-typed program
Wrong <course> element
An XMLObject cannot be passed as a course element
Embedding XPath Queries in XJ
• Syntax: XmlExpr [| XPathQuery |]
Requires: a context-provider: – An XML element over which the XPath query
is invoked– (see the cat variable in the sample)
course doSomething(catalog cat, int courseNum) { return cat [| /course[./number = $courseNum] |];}
• Problem: resulting type is sometimes not so clear• Two options
– Sequence<T>• If the compiler determines that all result
elements are of type T
– Sequence<XMLObject>• (Otherwise)
• Automatic conversion from a singleton sequence
• Static check of XPath queries– If result is always empty => compile-time error
XPath Semantics
XJ Updates (Introduction)
• XJ provide three kinds of updates: 1) Simple assignment. 2) Bulk assignment. 3) Structural updates.
• XJ updates are chosen to be consistent with Java’s reference semantics.
XJ Updates (syntax and semantics)
Simple Assignment
The XPath expression returns a reference to the existing element to be updated.
Bulk Assignment
The XPath expression denotes a sequence , bulk assignment allows multiple
assignments. Here double the credit points of each course.
public static void changePoint(catalog.course c, int p) {
cat [| //points |] *:= 2;}
public static void changePoint(catalog.course c, int p) {
c [| /points |] = p;}
XJ Updates (syntax and semantics) Structural updates
public static void addCourse(catalog cat) { course c = new course(<course><points>4</points> <number>234111</number><name>Intorduction to CS</name> <teacher>Roy Friedman</teacher></course>); cat.insertAsLast(c);}
Class XML Object also defines methods, such as:
insertAfter()insertBefore()insertAsFirst()insertAsLast()detach()
XJ Updates Problems : Cycles
Updates may cause cycles, e.g. a class that have more than one parent.
This arises a run time exception.
Ensuring that the root is never inserted into one of it’s descendants.
Why cycles are bad ?
Can you think of a solution ?
XJ Updates Problems : Type Consistency
• Definitions
1. An XML update operation, u, is a mapping over XML values• u: T1 -> T2
2. An update is consistent if T1 = T2
• Ideally, a compile-time error should be triggered for
each inconsistent update in the program
• Unfortunately, this cannot be promised
• The solution: Additional run-time check
Can you think of an example ?
XJ Updates Problems: Covariant subtyping (the problem)
• Covariance: change of type in signature is in the same direction as that of the inheritance
class X { }class A { public void m(X x) { } }
Class X1 extends X { }Class A1 extends A { public void m(X1 x) { } }...A a = new A1(); a.m(new X());
A1.m() is “spoiled”:
Requires only X1 objects
• Java favors type-safety: A method with covariant arguments is considered to be an overloading rather than overriding
– Same approach is taken by C++, C#
• But, covariance is allowed for arrays– Array assignments may fail at run-time
Which method should be invoked: A.m()
or A1.m() ?
(Now let us get back to our technioncatalog schema…)
• A <course> value is also spoiled – It requires unique children: <points>, <name>, etc.
• But, it also has an unspoiled super-class: XMLObject
– All updates to XMLObject are legal at compile-time
• The following code compiles successfully:public static void trick(course c) { XMLObject x = c; points p = new points(<points>4</points>); x.appendAsLast(p); } Run-time error is
here !!
XJ Updates Problems: Covariant subtyping (example)
• Language constructs seen so far
– Typed XML objects – Seamless translation of a Schema/DTD into a Java
type – Two composition techniques
• XML notation • Java’s object creation syntax
– Two decomposition techniques• Typed XPath • Typed, named methods/fields
– XPath expressions as first-class-values
Shaping the future (revisited)
XPath expression as first-class-values
• What is a first-class-value?– A value that can be used “naturally” in the program
• Passed as an argument• Stored in a variable/field• Returned from a method• Created
• In XJ, XPath expression do not met these conditions– The main obstacle: The XPath part of the expression
cannot be separated from its context provider
XPath expression as first-class-values
• Operators on XPath values– Composition– Conjunction– Disjunction
• These operators will allow the developer to easily create a rich array of safe XPath values
• The compiler must keep track of the type of each such value
– Basically an XPath value is a function T -> R, where both T,R are subclasses of XMLObject
– When two XPath values are composed, the result type is deduced from the types of the operands
Typed, named methods/fields
• Usually, values aggregated by a Java object are accessed by fields/methods– Can we access XML sub-elements this
way?– (Following code IS NOT a legal XJ
program)
import technioncatalog.*;void printTeachers(catalog cat) { for(int i = 0; i < cat.courses.length; ++i) { catalog.course c = cat.courses[i]; System.out.println(c.teacher); }}
Typed, named methods/fields
• Some of the difficulties:– Sub-elements are not always named– Schema supports optional types: <xsd:choice>
• How can Java express an “optional” field?
• Observation: Java’s typing mechanisms cannot capture the wealth of Schema/DTD types
– Missing features: virtual fields, inheritance without polymorphism
– Other features can be found in Functional languages• E.g.: Variant types, immutability, structural
conformance• But, their popularity lags behind
Conclusion• XJ is a Java extension that has built
in support for XML– Type safety: Many things are checked at
compile time– Ease of use
• OO languages are not powerful enough (in terms of typing)– Some type information is lost in the
transition Schema -> Java