55
Applied Component-Based Software Engineering XML Basics CSE 668 / ECE 668 Prof. Roger Crawfis

Applied Component-Based Software Engineering XML Basics

  • Upload
    marcos

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Applied Component-Based Software Engineering XML Basics. CSE 668 / ECE 668 Prof. Roger Crawfis. XML Quiz. What does XML stand for? Is XML a language? What is HTML? What is XHTTP? What is HTTPS? Is HTML a language?. XML Quiz. What does XML stand for? e X tensible M arkup L anguage - PowerPoint PPT Presentation

Citation preview

Page 1: Applied Component-Based  Software Engineering XML Basics

Applied Component-Based Software Engineering

XML Basics

CSE 668 / ECE 668Prof. Roger Crawfis

Page 2: Applied Component-Based  Software Engineering XML Basics

XML Quiz

What does XML stand for?

Is XML a language?

What is HTML? What is XHTTP? What is HTTPS?

Is HTML a language?

Page 3: Applied Component-Based  Software Engineering XML Basics

XML Quiz

What does XML stand for? eXtensible Markup Language

Is XML a language? No!

What is HTML? What is XHTTP? What is HTTPS? xhttp is a well-formed html (aka a valid XML)

Is HTML a language? Yes!

Page 4: Applied Component-Based  Software Engineering XML Basics

XML Motivation

Data interchange is critical in today’s networked world Examples:

Banking: funds transfer Order processing (especially inter-company orders) Scientific data

Chemistry: ChemML, … Genetics: BSML (Bio-Sequence Markup Language), …

Paper flow of information between organizations is being replaced by electronic flow of information

Each application area has its own set of standards for representing information Plain text with line headers indicating the meaning of fields

XML has become the basis for all new generation data interchange formats

Page 5: Applied Component-Based  Software Engineering XML Basics

Semi-structured Data

Nodes = objects.Labels on arcs (attributes, relationships).Atomic values at leaf nodes (nodes with

no arcs out).Flexibility: no restriction on:

Labels out of a node.Number of successors with a given label.

Page 6: Applied Component-Based  Software Engineering XML Basics

Example: Data Graph

BudA.B.

Gold1995

MapleJoe’s

M’lob

beer beerbar

manfmanf

servedAt

name

namename

addr

prize

year award

root

The bar objectfor Joe’s Bar

The beer objectfor Bud

Notice anew kindof data.

Page 7: Applied Component-Based  Software Engineering XML Basics

XML Standardization

World Wide Web Consortium (W3C) http://www.w3.org

More resources at http://www.xml.com

Java-XML (and web services) info at http://java.sun.com/javaee/technologies

.NET-XML (via web services) info at http://www.microsoft.com/net/TechnicalResources

Page 8: Applied Component-Based  Software Engineering XML Basics

XML Uses

Example: the Ajax technology. Small volume browser-server communication in XML supports more interactive Web pages.

Example: Web services. Marshalling and unmarshalling data in SOAP uses XML. Service descriptions use XML.

Page 9: Applied Component-Based  Software Engineering XML Basics

XML Uses

Example: Data exchange formats. (Applications must agree on common meaning for tags.)

Older data exchange formats have been redesigned as instances of XML, eg. HL7 in health informatics, FIX in the financial industry, etc. Even proprietary formats like MS Word now have open XML versions.

Example: Software development configuration files, eg., in W3C, Apache, Java EE, .NET frameworks.

(All this may be geek paradise but it’s awfully verbose and the scarcity of visual editors is puzzling.)

Page 10: Applied Component-Based  Software Engineering XML Basics

Why People Like XML

Can get data from all sorts of sourcesAllows us to touch data we don’t own!Can integrate various data sources as if

they were databases (almost)We can publish some of the data in our

databases on the Web conveniently

Page 11: Applied Component-Based  Software Engineering XML Basics

Well-Formed and Valid XML

Well-Formed XML allows you to invent your own tags.Similar to labels in semi-structured data.

Valid XML involves either a: DTD (Document Type Definition), a

grammar for tags.XSD (XML Scheme Document), a grammar

for tags in XML format.

Page 12: Applied Component-Based  Software Engineering XML Basics

Well-Formed XML

A legal XML document – fully parsable by an XML parserAll open-tags have matching close-tagsAttributes (which are unordered) only

appear once in an elementThere’s a single root element

Page 13: Applied Component-Based  Software Engineering XML Basics

Well-Formed XML

Start the document with a declaration, surrounded by <?xml … ?> .

Normal declaration is:<?xml version = “1.0” standalone = “yes” ?>

Standalone – DTD or Schema provided.

Balance of document is a root tag surrounding nested tags.

Page 14: Applied Component-Based  Software Engineering XML Basics

Tags

Tags, as in HTML, are normally matched pairs, as <FOO> … </FOO> .

Tags may be nested arbitrarily.XML tags are case sensitive.

Page 15: Applied Component-Based  Software Engineering XML Basics

Example: Well-Formed XML

<?xml version = “1.0” standalone = “yes” ?><BARS>

<BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME>

<PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME>

<PRICE>3.00</PRICE></BEER></BAR><BAR> …

</BARS>

A NAMEsubobject

A BEERsubobject

Page 16: Applied Component-Based  Software Engineering XML Basics

XML and Semi-structured Data

Well-Formed XML with nested tags is exactly the same idea as trees of semi-structured data.

Graphs are possible through indirection.

Page 17: Applied Component-Based  Software Engineering XML Basics

Example

The <BARS> XML document is:

Joe’s Bar

Bud 2.50 Miller 3.00

PRICE

BARBAR

BARS

NAME . . .

BAR

PRICENAME

BEERBEER

NAME

Page 18: Applied Component-Based  Software Engineering XML Basics

XML as a Data Model

XML “information set” includes 7 types of nodes: Document (root) Element Attribute Processing instruction Text (content) Namespace Comment

XML data model includes this, plus order info and a few other things

Page 19: Applied Component-Based  Software Engineering XML Basics

XML Anatomy

<?xml version="1.0" encoding="ISO-8859-1" ?> <dblp> <mastersthesis mdate="2002-01-03" key="ms/Brown92">  <author>Kurt P. Brown</author>   <title>PRPL: A Database Workload Specification Language</title>   <year>1992</year>   <school>Univ. of Wisconsin-Madison</school>   </mastersthesis> <article mdate="2002-01-03" key="tr/dec/SRC1997-018">  <editor>Paul R. McJones</editor>   <title>The 1995 SQL Reunion</title>   <journal>Digital System Research Center Report</journal>   <volume>SRC1997-018</volume>   <year>1997</year>   <ee>db/labs/dec/SRC1997-018.html</ee>   <ee>http://www.mcjones.org/System_R/SQL_Reunion_95/</ee>   </article>

Attribute

Element

Close-tag

Open-tagProcessing Instr.

Page 20: Applied Component-Based  Software Engineering XML Basics

A Visualization of XML Data

Root

?xml dblp

mastersthesis article

mdate keyauthor title year school editor title yearjournal volume eeee

mdatekey

2002…

ms/Brown92

Kurt P….

PRPL…

1992

Univ….

2002…

tr/dec/…

Paul R.

The…

Digital…

SRC…

1997

db/labs/dec

http://www.

attributeroot

p-i element

text

Page 21: Applied Component-Based  Software Engineering XML Basics

Empty Elements

We can do all the work of an element in its attributes. Like BEER in previous example.

Another example: SELLS elements could have attribute price rather than a value that is a price.

Example use:<SELLS theBeer = “Bud” price = “2.50”/>

Note exception to“matching tags” rule

Page 22: Applied Component-Based  Software Engineering XML Basics

XML Namespaces

Namespaces allow us to specify a context for different tags

Two parts: Binding of namespace to URI Qualified names

<tag xmlns:myns=“http://www.fictitious.com/mypath”><thistag>is in namespace myns</thistag><myns:thistag>is the same</myns:thistag><otherns:thistag>is a different tag</otherns:thistag>

</tag>

Page 23: Applied Component-Based  Software Engineering XML Basics

XML Attributes

An (opening) tag may contain attributes. These are typically used to describe the content of an element

<entry> <word language = “en”> cheese </word> <word language = “fr”> fromage </word> <word language = “ro”> branza </word> <meaning> A food made … </meaning>

</entry>

Page 24: Applied Component-Based  Software Engineering XML Basics

XML Attributes

Another common use for attributes is to express dimension or type

<picture> <height dim= “cm”> 2400 </height> <width dim= “in”> 96 </width> <data encoding = “gif” compression = “zip”> M05-.+C$@02!G96YE<FEC ... </data></picture>

Page 25: Applied Component-Based  Software Engineering XML Basics

When to use attributes

<person ssno= “123 45 6789”> <name> F. MacNiel </name> <email> [email protected] </email> ...</person>

<person> <ssno> 123 45 6789 </ssno> <name> F. MacNiel </name> <email> [email protected] </email> ...</person>

The choice between representing data as attributes or as elements is sometimes unclear, taste applies.

Page 26: Applied Component-Based  Software Engineering XML Basics

Defining the structure of an XML file

We can check if an XML file is well-formed by looking at it, maybe By loading it into a browser

If well-formed, it will be displayed

However, how can we check that the well-formed file contains the correct elements in the correct quantities? We need to write a specification for the XML file

Page 27: Applied Component-Based  Software Engineering XML Basics

XML Needs Help

It’s too unconstrained for many cases!How will we know when we’re getting

garbage?How will we query?How will we understand what we got?

We also need:Some idea of the structurePresentation, in some cases – CSS, XSLSome way of interpreting the tags

Page 28: Applied Component-Based  Software Engineering XML Basics

Defining the structure of an XML file

There are 2 main alternativesDocument Type Definitions

Original and simpleXML Schema

More versatile and complexWe will look at both

Concentrating on XML SchemaXML documents are not required to have

an associated schema

Page 29: Applied Component-Based  Software Engineering XML Basics

Document Type Definition (DTD)

The type of an XML document can be specified using a DTD

DTD constrains structure of XML data What elements can occur What attributes can/must an element have What sub-elements can/must occur inside each element, and

how many times. DTD does not constrain data types

All values represented as strings in XML DTD syntax

<!ELEMENT element (subelements-specification) > <!ATTLIST element (attributes) >

Page 30: Applied Component-Based  Software Engineering XML Basics

Example: An Address Book

<person ssn = “4444”> <name> Homer Simpson </name><tel> 2543 </tel><tel> 2544 </tel><email>

[email protected] </email></person>

Up to 4 tel nos

At least one email

Exactly one nameAn attribute

One or more persons

Page 31: Applied Component-Based  Software Engineering XML Basics

Example: The Address Book2

<person> <name> MacNiel, John </name><greet> Dr. John MacNiel </greet><addr>1234 Huron Street </addr><addr> Rome, OH 98765 </addr><tel> (321) 786 2543 </tel><fax> (321) 786 2543 </fax><tel> (321) 786 2543 </tel><email> [email protected] </email>

</person>

Exactly one nameAt most one greeting

As many address lines as needed (in order)

Mixed telephones and faxes

At least one

Page 32: Applied Component-Based  Software Engineering XML Basics

DTD - Specifying the Structure

In a DTD, we can specify the permitted content for each element, using regular expressions

For a person element, the regular expression isname, title?, tel*,email+

Page 33: Applied Component-Based  Software Engineering XML Basics

What’s in a person Element?

This meansname = there must be a name element title? = there is an optional title element (i.e.,

0 or 1 title elements)name, title? = the name element is followed

by an optional title element tel* = there are 0 or more tel elements email+ = there are 1 or more email elements

Page 34: Applied Component-Based  Software Engineering XML Basics

DTD For the Address Book2

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE addressbook [ <!ELEMENT addressbook (person*)> <!ELEMENT person (name, title?, tel*, email+)> <!ELEMENT name (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT tel (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ATTLIST person

ssn CDATA REQUIRED>

]>PCDATA means parsed character data

Regular expressions

Page 35: Applied Component-Based  Software Engineering XML Basics

Attributes in a DTD

XML elements can have attributes. General Syntax for DTD:

<!ATTLIST element-name attribute-name1 type1 default-value1….attribute-namen typen default-valuen>

Example: <!ATTLIST person ssn CDATA REQUIRED>

CDATA means Character data Default value could be REQUIRED or IMPLIED (meaning

optional)

Page 36: Applied Component-Based  Software Engineering XML Basics

Example: DTD

<!DOCTYPE BARS [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>

]>

A BARS object haszero or more BAR’snested within.

A BAR has oneNAME and oneor more BEERsubobjects.

A BEER has aNAME and aPRICE.

NAME and PRICEare text.

Page 37: Applied Component-Based  Software Engineering XML Basics

Use of DTD’s

1. Set standalone = “no”.2. Either:

a) Include the DTD as a preamble of the XML document, or

b) Follow DOCTYPE and the <root tag> by SYSTEM and a path to the file where the DTD can be found.

Page 38: Applied Component-Based  Software Engineering XML Basics

Use of DTD’s

<?xml version = “1.0” standalone = “no” ?><!DOCTYPE BARS [

<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>

]><BARS>

<BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME>

<PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME>

<PRICE>3.00</PRICE></BEER></BAR> <BAR> …

</BARS>

The DTD

The document

Page 39: Applied Component-Based  Software Engineering XML Basics

Use of DTD’s

Assume the BARS DTD is in file bar.dtd.<?xml version = “1.0” standalone = “no” ?><!DOCTYPE BARS SYSTEM “bar.dtd”><BARS>

<BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME>

<PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME>

<PRICE>3.00</PRICE></BEER></BAR><BAR> …

</BARS>

Get the DTDfrom the filebar.dtd

Page 40: Applied Component-Based  Software Engineering XML Basics

Valid Documents

A document with a DTD is valid if it conforms to the DTD, i.e., the document conforms to the regular-

expression grammar, types of attributes are correct, andconstraints on references are satisfied

Page 41: Applied Component-Based  Software Engineering XML Basics

DTDs Problems

DTDs are rather weak specifications by DB & programming-language standards

Some limitations:Only one base type – PCDATAAlso no constraints, e.g range of values,

frequency of occurrenceNot easily parsed (since they are not XML)Not easy to express that element a has

exactly the children c, d, e in any order

Page 42: Applied Component-Based  Software Engineering XML Basics

DTDs Problems

Difficult to specify unordered sets of subelements Order is usually irrelevant in databases (unlike

in the document-layout environment from which XML evolved)

(A | B)* allows specification of an unordered set, but

Cannot ensure that each of A and B occurs only once

Many other more complex problems.

Page 43: Applied Component-Based  Software Engineering XML Basics

XML Schema

DTDs are now being superceded by XML schemas. They provide the following features

XML Syntax So can be parsed, validated with standard XML tools

Data types other than #PCDATA There are built in types such as integer, float, boolean,

string and many others Greater control over permitted constructs

Can specify maximum and minimum occurrences Can use regular expressions to set patterns to be

matched Support for modularity and inheritance

Page 44: Applied Component-Based  Software Engineering XML Basics

Schema types

There are some basic built-in types such as xs:string, xs:decimal, xs:integer, xs:ID

Each element is composed of either simple types or complex types. A complex type is often a sequence of elements

The content of the type can be declared as shown in the following example. A type can also be declared, named and referred to.

Notice the use of minOccurs and maxOccurs. Their default is 1.

Page 45: Applied Component-Based  Software Engineering XML Basics

Simple Schema Example

<?xml version="1.0" ?> <xs:schema xmlns:xs= "http://www.w3.org/2001/XMLSchema"><xs:element name="people"> <xs:complexType>

<xs:sequence> <xs:element name="person" maxOccurs = "unbounded"> details of the person element -pto

</xs:element> </xs:sequence> </xs:complexType>

</xs:element> </xs:schema>

standard stuff

Top-level element

Namespace

Page 46: Applied Component-Based  Software Engineering XML Basics

Namespace declaration

So at the start of a document we must specify what namespaces we are using.

In the schema example, we are using the XML schema namespace with the xs prefix

We declare this namespace in an attribute in the top-level element<xs:schema xmlns:xs=

"http://www.w3.org/2001/XMLSchema"> We then use the xs prefix in all the XML Schema

elements e.g. complexType, sequence, element etc

Page 47: Applied Component-Based  Software Engineering XML Basics

Schema Example Continued

Details of the person element<xs:element name="person"

maxOccurs="unbounded"> <xs:complexType>

<xs:sequence> <xs:element name ="name" type="xs:string"/> <xs:element name = "tel" type="xs:string" /> <xs:element name = "email" type="xs:string"

minOccurs="0" maxOccurs="1"/> </xs:sequence>

<xs:attribute name= "sssNo" type="xs:integer" use="required"/>

</xs:complexType></xs:element> A person is a complex type

which is a sequence of elements and an attribute

Empty element

Page 48: Applied Component-Based  Software Engineering XML Basics

Restrictions on elements

You can also restrict the data values a range

<xs:minInclusive value="0"/> <xs:maxInclusive value="120"/>

an enumerated list <xs:enumeration value="Audi"/>

<xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/>

a pattern <xs:pattern value="([a-z])*"/>

Means 0 or more lowercase alphabetic chars

Page 49: Applied Component-Based  Software Engineering XML Basics

XSD Built-in Types

Page 50: Applied Component-Based  Software Engineering XML Basics

Declaring your own types

Named types can be used for elements or attributes. Here’s an example which specifies restrictions on the attribute A named type is declared

<xs:simpleType name = "ssstype"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> </xs:restriction></xs:simpleType>

And used as the attribute type <xs:attribute name= "sssNo" type="ssstype"

use="required"/>

Page 51: Applied Component-Based  Software Engineering XML Basics

More complex Schemas

The previous example shows a simple schema.

It is also possible to make the schema easier to maintain by declaring all the simple elements first and

then referring to them in the body of the document

By naming the declaration of simple and complex types, which could then be used later in the document, and more than once if necessary

Page 52: Applied Component-Based  Software Engineering XML Basics

Referring to a schema

Save your schema in a file with the extension xsd.

Linking schema definition with a document is done using a special attribute of the root node of the document:<people

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation=“people.xsd">

Page 53: Applied Component-Based  Software Engineering XML Basics

Validating

Validators http://www.w3.org/2001/03/webdata/xsv http://tools.decisionsoft.com/schemaValidate/Many others on the web

Page 54: Applied Component-Based  Software Engineering XML Basics

XML Schema Example

<xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema><xs:element name=“bank” type=“BankType”/><xs:element name=“account”>

<xs:complexType> <xs:sequence> <xs:element name=“account_number” type=“xs:string”/> <xs:element name=“branch_name” type=“xs:string”/> <xs:element name=“balance” type=“xs:decimal”/> </xs:squence></xs:complexType>

</xs:element>….. definitions of customer and depositor ….<xs:complexType name=“BankType”>

<xs:squence><xs:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/><xs:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/><xs:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/>

</xs:sequence></xs:complexType></xs:schema>

Page 55: Applied Component-Based  Software Engineering XML Basics

Application Program Interface

Two standard application program interfaces to XML data (Java, C++, etc.): SAX (Simple API for XML) (3rd party for .NET)

Based on parser model, user provides event handlers (call-back functions) for parsing events

E.g. start of element, end of element Not suitable for database applications

DOM (Document Object Model) XML data is parsed into a tree representation Functions for accessing, traversing and searching the DOM .NET DOM API provides XmlNode class:

ParentNode, ChildNodes, NextSibling, FirstChild, Attributes. properties

.NET adds a 3rd method: LINQ to XML.