1 XDuce XDuce: A statically Type XML Processing: Hosoya and Pierce Presented by: Guy Korland Based...

Preview:

Citation preview

1

XDuce

XDuce: A statically Type XML Processing: Hosoya and PiercePresented by: Guy Korland

Based on presentation by:Tabuchi Naoshi(tabee@yl.is.s.u-tokyo.ac.jp)

2

Presentation Outline

Introduction (pronounced “transduce”) Programming in XDuce.

Values Regular Expression Types Subtyping

Pattern matching. Conclusions.

3

XDuce: What For?

A functional language for XML processing.

On the basis of Regular Expression Types Pattern Matching

Statically Typedi.e. Outputs are statically checked against DTD-conformance etc.

4

Advantages (vs. “untyped”)

“Untyped” XML processing: programs using DOM etc. Little connection between program and

XML schema. Validity can be checked only at run-time,

if any.

5

Advantages (vs. “embedding”)

“Embedding” : mapping XML schema into language’s type system.

e.g.<!ELEMENT person (name, mail*, tel?)> (DT

D)

type person = name * mail list * tel option (ML)

6

Advantages (vs. “embedding”)

Embedding does not suit intuition in some cases.

e.g.Intuitively… (name,mail*,tel?) <:(name,mail*,tel*)but not name * mail list * tel option <:

name * mail list * tel list

(ML)

7

Values

Values are XML Documents (input, output, intermediate).

Syntax: XDuce’s native syntax. Standard XML syntax document.

8

Values(cont.)

Standard XML syntax:<!-- mybook.xml --><addrbook>

<person> <name> Haruo Hosoya </name><email> hahosoya@kyoto-u </email><email> hahosoya@upenn </email>

</person><person>

<name> Benjamin Pierce </name><email> bcpierce@upenn </email><tel> 123-456-789 </tel> </person>

</addrbook>

let val doc = load_xml(“mybook.xml”)

9

Values(cont.)

XDuce’s native syntax:let val mybook = addrbook[

person[name["Haruo Hosoya"],email["hahosoya@kyoto-u"],email["hahosoya@upenn"]],

person[name["Benjamin Pierce"],email["bcpierce@upenn"],tel["123-456-789"]]]

Constructor labal[…] where … is sequence of other values.

String enclosed in double-quotes,

unlike XML

10

Regular Expression Types

Types are defined in regular expression form with labels. Concatenation, union, alteration as basic

constructors. Labels correspond to elements of XML

(person, name, mail, etc…).

11

Regular Expression Types (cont.)

Example:type Addrbook = addrbook[(Name, Addr, Tel?)*]type Name = name[String]type Addr = addr[String]type Tel = tel[String]

Correspond DTD:<!ELEMENT addrbook (name, addr, tel?)*><!ELEMENT name #PCDATA><!ELEMENT addr #PCDATA><!ELEMENT tel #PCDATA>

Types not labels

12

Syntax of Types:T ::= ()| X | L[T]

| T,T (* concat. *)| T|T (* alter. *)| T* (* rep. *)

whereX : Type Variable (String, Int…)L : Label

Regular Expression Types (cont.)

Empty sequence type

13

Regular Expression Types (cont.)

Syntactic sugar: T+ ≡ T,T* T? ≡ T|()

Types can be (mutually) recursive:

type Folder = Entry*type Entry = name[String], file[String] |

name[String], folder[Folder]

14

Regular Expression Types (cont.)

Syntax of Labels:

L ::= l (* specific label *)| ~ (* wildcard label

*)| L|L (* union *)| L\L (* difference *)

15

Regular Expression Types (cont.)

The label class ~ represents the set of all labels.

We can define a type Anytype Any =(~[Any] |Int |Float |String)*

Labels Uniontype Heading = (h1|h2|h3|h4|h5|h6)

[Inline]

(HTML headings)

16

Subtyping

Meaning of subtypes is as usual:All values t of T are also values of T’

T <: T’ ⇔ ∀t ∈ T ⇒ t ∈ T’ Examples:

Name,Addr <: Name,Addr,Tel? Name,Addr,Tel <: Name,Addr,Tel? addrbook[Name,Addr,Name,Addr,Tel]

<: addrbook[(Name,Addr,Tel?)*]

17

Subtyping - Union Types

Union (or alternation) type constructor |. Example:

Name <: Name | Tel Tel <: Name | Tel

Forget ordering (Name,Addr)*,(Name,Tel)* <:

((Name,Addr)|(Name,Tel))*

Distributivity (Name,Tel)|(Name,Addr) <: Name,(Addr|Tel)

18

Subtyping - Subtagging

Allowing subtyping between types with different labels. (beyond the expressive power of DTD)

e.g. (HTML)subtag i <: fontstylesubtag b <: fontstyle

i[T] <: fontstyle[T]b[T] <: fontstyle[T]

19

Complexity of Subtyping

Subtype relation (T <: T’) is equivalent to inclusion CFGs Undecidable!

Need some restrictions on syntax.

(next slide…)

20

Well-formedness of Types

Syntactic restriction on types to ensure “regularity”.

Recursive use of types can only occur at the tail position of type definition, or inside labels.

21

Well-formed Types: Examples

type X = Int, Ytype Y = String, X | ()

and

type Z = String, lab[Z], String |()are well-formed, but

type U = Int, U, String |()is not.

22

Complexity of Subtyping, again

With well-formedness, checking subtype relation is: Still EXPTIME-complete, equivalent to

inclusion of tree automata [CDG+]

but acceptable in practical cases.

23

Pattern matching (cont.)

ML-like pattern matching:

“pattern -> expression”

Example:val url = match v with

www[val s as String] -> "http://“ ^ s

| email[val s as String] -> "mailto:" ^ s

| ftp[val s as String] -> "ftp://" ^ s

24

Pattern matching (cont.)

Pattern match can also involve regular expression types.

e.g.match p with

| person[name[String],(val ms as Mail*),

(val t as Tel?)] -> …

25

Pattern matching (cont.)

Functions – reusable pattern matching.

Example:fun make_url(val s as String): String =

match s with

www[val s as String] -> "http://" ^ s

| email[val s as String] -> "mailto:" ^ s

| ftp[val s as String] -> "ftp://" ^ s

26

Policies of Pattern Matching

Pattern matching has two basic policies: First-match (as in ML):

only the first pattern matched is taken. Longest-match

(as usual in regexp. matching on string):matching is done as much as possible.

27

First-match: Example

(* p = person[name, mail, tel] *)match p with| person[Name, (val ms as Mail*), Tel]

-> (* invoked *)| person[Name, (val ms as Mail*), Tel?]

-> (* not invoked *)

28

Longest-match: Example

(* p = person[name, mail, mail, tel] *)

match p with

| … (val m1 as Mail*),(val m2 as Mail*),

…-> (* m1 = mail, mail

m2 = () *)

29

Exhaustiveness and Redundancy

Pattern matches are checked against exhaustiveness and redundancy. Exhaustiveness: No “omission” of values. Redundancy: Never-matched patterns.

30

Exhaustiveness

A pattern match P1 -> e1 | … | Pn -> en is exhaustive (wrt. input type T)⇔All values t ∈ T are matched by some Pi

orT <: P1 | … | Pn

31

Exhaustiveness: Example (1/2)

(* type Person = person[Name, Mail*, Tel?] *)

match p with

| person[Name, Mail*, Tel]-> ...

| person[Name, Mail*]-> ...

is exhaustive patterns (wrt. Person)

32

Exhaustiveness: Example (2/2)

(* type Person = person[Name, Mail*, Tel?] *)

match p with

| person[Name, Mail*, Tel]-> ...

| person[Name, Mail+]-> ...

is NOT exhaustive (wrt. Person):person[name[...]] does not match

33

Redundancy

A pattern Pi is redundant in

P1 -> e1 | … | Pn -> en

(wrt. input type T)⇔All values matched by Pi is matched by P1 | ... | Pi-1

34

Redundancy: Example

(* type Person = person[Name, Mail*, Tel?] *)

match p with| person[name, Mail*, tel?]

-> ...| person[name, Mail*)]

-> ...

Second pattern is redundant:anything match second pattern also match first one.

35

Complete Example (1/3)

type Addrbook = addrbook[Person*]type Person = person[Name,Email*,Tel?]type Name = name[String]type Email = email[String]type Tel = tel[String]

(* and output documents. *)type TelBook = telbook[TelPerson*]type TelPerson = person[Name,Tel]

(* load an address book *)let val doc = load_xml("mybook.xml")

36

Complete Example (2/3)

(* validate it against the type Addrbook *)

let val valid_doc = validate doc with Addrbook

(* extract the content of the top label addrbook *)

let val out_doc =

match valid_doc with

addrbook[val persons as Person*] ->

telbook[make_tel_book(persons)]

(* save out_doc to out.xml*)

save_xml("output.xml")(out_doc)

37

Complete Example (3/3)

(* take ps of type Person* and return TelPerson* *)fun make_tel_book (val ps as Person*) : TelPerson* =

match ps withperson[name[val n as String], Email*,

tel[val t as String]],val rest as Person*

-> person[name[n], tel[t]], make_tel_book(rest)

| person[name[val n as String], Email*], val rest as Person*

-> make_tel_book(rest)

| () -> ()

Recursive call

38

Conclusion

Expressiveness of regular expression types/pattern matching are useful for XML processing.

Type inference (including subtype relation) is possible and efficient (in most practical cases). (Appendix 2)

39

Applications

Bookmarks

(Mozilla bookmark extraction). Html2Latex. Diff (diff for XML). All 300 – 350 lines.

40

Future Works

Precise type inference on all variables. Introducing Any type: Not possible by

naïve way. Breaks closure-property of tree

automata. Makes type inference impossible.

41

References

XDuce: A statically Type XML Processing: Hosoya and Pierce

XDuce: A typed XML Processing Language: Hosoya and Pierce

Regular Expression Pattern Matching for XML: Hosoya and Pierce

Regular Expression Types for XML: Hosoya, Vouillon, and Pierce

Available @ http://xduce.sourceforge.net

42

Appendix 1:Type Inference

43

Type Inference (1/2)

Infer types of variables in patterns Results are exact types of variables Type of each variable depends on

pattern itself, and type of input

44

Type Inference (2/2)

Type inference is “flow-sensitive” In P1 -> e1 | … | Pn -> en , inference on

Pi depends on P1 ... Pi-1

Because... Values matched by Pi are those NOT matc

hed by P1 ... Pi-1

45

Type Inference: Example (1/2)

(* p :: person[name[], mail*, tel[]?] *) match p with

| person[name[], rest] -> …

Type of rest is inferred

mail*, tel[]?

In this case

46

Type Inference: Example (2/2)

match p with| person[name[], tel[]] -> …

| person[name[], rest] -> …

Type of rest becomes

(mail+, tel[]?) | ()

In this case, because…person[name[], (), tel[]]

Is matched by the first pattern.

47

Type Inference: Limitations

“Exact” type inference is possible only on Variables at tail position, or Inside labels (c.f. well-formedness)

Limitation comes from internal representation of patterns (binary trees)

48

Appendix 2:Algorithms for Pattern Matching

49

Algorithms for Pattern Matching

Pattern matching takes following steps Translation of values into internal forms

(binary trees). Translation of types and patterns into

internal forms (binary trees and tree automata).

Values are matched by patterns, in terms of tree automata.

50

Internal Forms of Values

Values are represented as binary trees internally:

t ::= ε (* leaves *)| l(t, t) (* labels *)

First node is content of the label, second

is remainder of the sequence.

51

Internal Forms of Values: Example

person[name[], mail[], mail[]]

is translated into

person(name(ε,mail(ε,mail(ε,ε))),ε)

52

Internal Forms of Types

Types are also translated into binary trees

T ::= φ (* empty *)| ε (* leaves *)| T|T (* union *)

| l(X, X) (* label *) X is States, used in tree automata

53

Internal Forms of Types: Tree Automata

A tree automaton M is a mapping of States -> Typese.g.

M(X) = name(Y, Z)M(Y) = εM(Z) = mail(Y, Z) | ε

...

54

Internal Forms of Types: Example

type Person =

person[name[], mail*, tel[]?]

is translated into binary tree: person(X1, X0) and tree automaton M, s.t.

M(X0) = εM(X1) = name(X0, X2),M(X2) = mail(X0, X2) | mail(X0, X3) | εM(X3) = tel(X0, X0)

55

Internal Forms of Patterns

Patterns are similar to types, with some additions

P ::= (* same as types... *)| x : P (* x as P *)| T (* wildcard *)

Wildcards are used for non “as”-ed variables.

56

Internal Forms of Patterns: Example

Patternperson[name[n], (ms as mail*)]

is translated into binary tree

person(Y1, Y0)

and tree automaton N, s.t.N(Y0) = εN(Y1) = name(n:T, ms:Y2)N(Y2) = mail(Y0, Y2) | ε

57

Pattern Matching (1/3)

Pattern matching has two roles match input values (of course!) bind variables to components of input val

ue, if matched Written formally

t ∈ D ⇒ V“t is matched by D, yielding V” (V : Vars -> Values)

58

Pattern Matching (2/3)

Matching relation t ∈ D ⇒ V is defined by following rules... (next slide)

Assumptions: D is a set of patterns and states A tree automaton N is implied (D, N) corresponds to the external pattern

59

Pattern Matching (3/3)

212121

222111

21

21

21

1

),(),(

|

|

}{:

)(

VVYYlttl

VYtVYt

VPPt

VPtPt

VPPt

VPtTt

txVPxt

VPt

VYt

VYNt

Recommended