32
No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarke Itay Maman 236826 Seminar lecture, 15 June 2005

No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

  • Upload
    luka

  • View
    37

  • Download
    3

Embed Size (px)

DESCRIPTION

No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java. Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarke Itay Maman 236826 Seminar lecture, 15 June 2005. The basic premise. XML is getting increasingly popular - PowerPoint PPT Presentation

Citation preview

Page 1: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

No More Pain for XML’s Gain

XJ: Facilitating XML Processing in Java

Matthew HarrenMukund RaghavachariOded ShmueliMichael Burke Rajesh BordawekarIgor PechtchanskiVivek Sarke

Itay Maman236826 Seminar lecture, 15 June 2005

Page 2: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

2

The basic premise

• XML is getting increasingly popular• XML manipulation is now a common programming

task• The lead question:

– Do modern OO languages sufficiently support XML ?

Page 3: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

3

<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element name="course" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="points" type="xs:int"/> <xs:element name="number" type="xs:int"/> <xs:element name="name" type="xs:string"/> <xs:element name="teacher" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element></xs:schema>

Introduction: Schema file(file: technioncatalog.xsd)

Page 4: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

4

Desired Output...

Introduction: XML document(file: short.xml)

<?xml version="1.0" encoding="UTF-8"?><catalog> <course> <points>3</points> <number>234319</number> <name>Programming Languages</name> <teacher>Ron Pinter</teacher> </course> <course> <points>3</points> <number>234141</number> <name>Combinatorics for CS</name> <teacher>Ran El-Yaniv</teacher> </course></catalog>

“Combinatorics for CS (234141) by Ran El-Yaniv, 3 credit points”

Page 5: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

5

Introduction: The XJ program

import java.io.*; import technioncatalog.*; public class Demo1 { public static void main(String[] args) throws Throwable { catalog cat = new catalog(new(File("short.xml")); catalog.course c = cat [| /course[2] |]; printCourse(c); } private static void printCourse(catalog.course c) { String name = c [| /name |]; String teacher = c [| /teacher |]; int points = c [| /points |]; int id = c [| /number |]; System.out.println(name + "(" + id + ") by " + teacher + ", " + points); } }

Page 6: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

6

public static void main(String[] args) throws Throwable { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new java.io.File("short.xml")); XPath xp = XPathFactory.newInstance().newXPath(); DTMNodeList nodes = (DTMNodeList) xp.evaluate("//course", doc, XPathConstants.NODESET); printCourse(nodes.item(1)); }

XPath is a plain string. It may be:•Syntactically incorrect•Incompatible with the document

The types of the XML objects

(Node, Document) do not reflect the schema

Traditional XML processing: (DOM, XPath apis)

Page 7: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

7

private static void printCourse(Node n) { NodeList nodes = n.getChildNodes(); System.out.println(nodes.item(5).getTextContent() + " (" + nodes.item(3).getTextContent() + ") by " + nodes.item(7).getTextContent() + ", " + nodes.item(1).getTextContent() + " credit points");}

Assumption: Four child nodes must exist

Assumption: 3rd child is the course number

• These assumptions will not hold if the schema is changed– => run-time errors– problems remain, even if we identify nodes by name

• Possible Schema changes:– Allowing a new optional <students> sub-element– Changing the order of the sub-elements

What about reading the numeric value of an element?

Traditional XML processing(DOM apis)

Assumption: 2nd child has no child elements

Page 8: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

8

No easy solution

• Similar problems occur when:1. XML elements are created by the program

2. Other libraries are used for reading/writing XML documents– Such as: Xalan, SAX

3. The developer wraps several complex operations within a single function/method/class

• These are inherent problems of the language

Page 9: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

9

Shaping the future

• What XML-related facilities do we want?– Typed XML objects – Seamless translation of a Schema/DTD into a Java type – Two composition techniques

• XML notation • Java’s object creation syntax

– Two decomposition techniques

• Typed XPath • Typed, named methods/fields

– XPath expressions as first-class-values

Page 10: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

10

Has the future arrived yet?

• Significant effort in integration of XML into modern programming language

– XJ– Scala– Cω– XTatic– …

• We will overview the constructs offered by XJ– A super-set of Java– Available at: http://www.research.ibm.com/xj

Page 11: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

11

XJ’s Type system

Page 12: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

12

XJ’s Type system

• Hierarchy of classes– A common root class: XMLObject – Automatic import: package com.ibm.xj.*

• Genericity: Sequence<T>, XMLCursor<T>– XMLCursor<T> is a Sequence<T> iterator

Page 13: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

13

Integration with Schema

• The rationale: 1. An OO program is a collection of class definitions

2. A Schema file is a collection of type definitions

• => let’s integrate these definitions

• Any Schema is also an XJ types– The XJ compiler generates a “logical class” for

each such type– Schema file == package name– Using a schema == import schema_file_name;

Page 14: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

14

import technioncatalog.*;

public class Demo2 { public static void main(String[] args) throws Throwable { String x = "Algorithms 1"; int y = 234247; catalog cat = buildCatalog(new catalog.course( <course><points>3</points>

<number>{y}</number><name>{x}</name> <teacher>Shlomo Moran</teacher></course>)); }

private static catalog buildCatalog(catalog.course c) {

return new catalog(<catalog>{c}</catalog>); } }

XML literal in XJ code• Invalid XML content triggers a compile-time error• Resulting elements are typed!• Curly braces allow “escaping” back into XJ

Page 15: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

15

... course c = new course(<course> <teacher>Shlomo Moran</teacher></course>); buildCatalog(c);

XMLObject x = new course.teacher ( <teacher>Shlomo Moran</teacher>); buildCatalog(x);...

private static catalog buildCatalog(catalog.course c) { return new catalog(<catalog>{c}</catalog>); }

An ill-typed program

Wrong <course> element

An XMLObject cannot be passed as a course element

Page 16: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

16

Embedding XPath Queries in XJ

• Syntax: XmlValue [| XPathQuery |]

• Requires: a context-provider: – An XML element over which the XPath query is invoked

– (see the cat variable in the sample)

• Escaping: use a ‘$’ prefix

course doSomething(catalog cat, int courseNum) { return cat [| /course[./number = $courseNum] |];}

Page 17: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

17

• Problem: resulting type is sometimes not so clear• Two options

– Sequence<T>• If the compiler determines that all result elements are

of type T– Sequence<XMLObject>

• (Otherwise)

• Automatic conversion from a singleton sequence

• Static check of XPath queries– If result is always empty => compile-time error– (The compiler cannot catch all cases)

XPath Semantics

Page 18: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

18

Implicit coercions

• An atomic XML value can be seamlesslyconverted into a corresponding Java value

– xsd:double => double– xsd:boolean => boolean– xsd:string => java.lang.String– …

• This reduces the verbosity of XML-related code:

import technioncatalog.*;import technioncatalog.catalog.*;

public static String getTeacher(course c) { return c [| /teacher |]; }

Sequence<teacher> ► teacher ► String

Page 19: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

19

Updates: Assignment to Query Result

• An XPath expression returns a reference to an existing element

– (No copying is involved)– Consistent with Java’s semantics for objects

• Thus, it can be assigned to – An XPath expression is a legal lvalue

• Bulk assignment– Occurs when the XPath expression denotes a sequence– Bulk assignment operator := allows multiple assignments– Double the credit points of each course:

public static void changePoint(catalog.course c, int p) {

c [| /points |] = p;}

cat [| //points |] *:= 2;

Page 20: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

20

Tree structure update

• Class XMLObject also defines methods, such as:– insertAfter()– insertBefore()– insertAsFirst()– detach()

public static void addCourse(catalog cat) { course c = new course(<course><points>4</points> <number>234111</number><name>Intorduction to CS</name> <teacher>Roy Friedman</teacher></course>); cat.insertAsLast(c);}

Which object is being modified?

Page 21: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

21

Problems: Type Consistency

• Definitions1. An XML update operation, u, is a mapping over XML values

• u: T1 -> T2

2. An update is consistent if T1 = T2

• Ideally, a compile-time error should be triggered for each inconsistent update in the program

• Unfortunately, this cannot be promised

• The solution: Additional run-time check

Can you think of an example ?

Why do we want the two types to be equal?

Page 22: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

22

Problems: Covariant subtyping (1/2)

• Covariance: change of type in signature is in the same direction as that of the inheritance

class X { }class A { public void m(X x) { } }

Class X1 extends X { }Class A1 extends A { public void m(X1 x) { } }...A a = new A1(); a.m(new X());

A1.m() is “spoiled”: Requires

only X1 objects

• Java favors type-safety: A method with covariant arguments is considered to be an overloading rather than overriding

– Same approach is taken by C++, C#

• But, covariance is allowed for arrays– Array assignments may fail at run-time

Which method should be invoked: A.m() or

A1.m() ?

Page 23: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

23

Problems: Covariant subtyping (2/2)

(Now let us get back to our technioncatalog schema…)

• A <course> value is also spoiled – It requires unique children: <points>, <name>, etc.

• But, it also has an unspoiled super-class: XMLObject– All updates to XMLObject are legal at compile-time

• The following code compiles successfully:

public static void trick(course c) { XMLObject x = c; points p = new points(<points>4</points>); x.appendAsLast(p); }

Run-time error is here !!

Page 24: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

24

• Language constructs seen so far

– Typed XML objects – Seamless translation of a Schema/DTD into a Java type – Two composition techniques

• XML notation • Java’s object creation syntax

– Two decomposition techniques

• Typed XPath • Typed, named methods/fields

– XPath expressions as first-class-values

Shaping the future (revisited)

Page 25: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

25

XPath expression as first-class-values

• What is a first-class-value?– A value that can be used “naturally” in the program

• Passed as an argument• Stored in a variable/field• Returned from a method• Created

• In XJ, XPath expression do not met these conditions– The main obstacle: The XPath part of the expression cannot

be separated from its context provider

Page 26: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

26

XPath expression as first-class-values(cont’d)

• Let’s speculate on XPath as an FCV…• (Following code IS NOT a legal XJ program)

private static Sequence<teacher> teachers;

static Sequence<teacher> find(XPath<catalog,teacher> q) { Catalog c = new Catalog(new File("file1.xml")); return q.evaluate(c);}

static void main(String[] args) { Sqeuence<teacher> all = find(<catalog>[| //teacher |]); Sequence<teacher> few = find( <catalog>[| //number/234319/../../teacher |] );}

Page 27: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

27

XPath expression as first-class-values(cont’d)

• Operators on XPath values– Composition– Conjunction– Disjunction

• These operators will allow the developer to easily create a rich array of safe XPath values

• The compiler must keep track of the type of each such value

– Basically an XPath value is a function T -> R, where both T,R are subclasses of XMLObject

– When two XPath values are composed, the result type is deduced from the types of the operands

Page 28: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

28

import Data._; // import generated definitionsimport scala.xml._; // for creating PCDATA nodes

object Main with Application { val x = course(teacher(Text("Ran El-Yaniv")), points(Text("3")), name(Text("Combinatorics for CS")), number(Text("234141"))); Console.println(x); }

Scala: Composition of XML elements

• In Scala, types can be defined in a DTD file– A DTD can be translated into Scala classes via the

dtd2scala utility

• Scala offers two options for composition of XML elements:

– Using XML notation (similar to XJ)– Using case-class construction notation:

Page 29: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

29

Typed, named methods/fields

• Usually, values aggregated by a Java object are accessed by fields/methods

– Can we access XML sub-elements this way?– (Following code IS NOT a legal XJ program)

import technioncatalog.*;void printTeachers(catalog cat) { for(int i = 0; i < cat.courses.length; ++i) { catalog.course c = cat.courses[i]; System.out.println(c.teacher); }}

Page 30: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

30

Typed, named methods/fields(cont’d)

• Some of the difficulties:– Sub-elements are not always named– Schema supports optional types: <xsd:choice>

• How can Java express an “optional” field?

• Observation: Java’s typing mechanisms cannot capture the wealth of Schema/DTD types

– Missing features: virtual fields, inheritance without polymorphism

– Other features can be found in Functional languages• E.g.: Variant types, immutability, structural conformance• But, their popularity lags behind

Page 31: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

31

Summary

• XJ is a Java extension that has built in support for XML

– Type safety: Many things are checked at compile time

– Ease of use

• OO languages are not powerful enough (in terms of typing)

– Some type information is lost in the transition Schema -> Java

Page 32: No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java

32

-The End-