Upload
sibyl-chase
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
XML for E-commerce
Helena Ahonen-Myka
University of Helsinki
XML: background
SGML: standard for markup languages (1986)
HTML: an SGML application XML: a simplified version of SGML
(developed for the Web) software and platform independent
representations for structured data
Example, HTML
<html> <head> <title>An HTML document</title> </head> <body> <h1>Heading 1</h1> <p>Some text content</p> <h2>Subheading</h2> <p>More text.</p> </body></html>
Lists, images, links
<body> <h1>Finland</h1>
<ol> <li><a href=”trav.html”>Traveling</a> <li><a href=”culture.html”>Culture</a> <li><a href=”sports.html”>Sports</a> </ol>
<p><img src=”map.jpg” alt=”map”></body>
Tables<table border=”1”> <tr><th>Year</th><th>Sales</th></tr> <tr><td>2000</td><td>$18M</td></tr> <tr><td>2001</td><td>$25M</td></tr> <tr><td>2002</td><td>$36M</td></tr></table>
Year
Sales
2000
2001
2002
$18M
$25M
$36M
Forms
<form action=”http://some.com/add” method=”post”> <p>First name: <input type=”text” name=”fname”><br> Last name: <input type=”text” name=”lname”><br> <input type=”submit”><input type=”reset”> </p></form>
First name: _________________________Last name: __________________________Submit Reset
HTML
easy to describe simple documents (headings, text, lists, tables, images)
easy to create links to other documents or different parts of the same document
the elements have a default presentation style
Presentation
the browsers give elements a default presentation style
often the authors want something else it is wise to separate the presentation
from the document contents: ease of modifications and uniformity of the appearance
CSS: Cascading Style Sheets
Stylesheet defines for each element, e.g., the font, size, color, widths of margins
the structure of a document cannot be modified
several stylesheets can be attached to a document: modularity
CSS, examples
<style type=”text/css”> body { color: black; background: white; font-family: verdana, sans serif;} h1, h2 { color: red; } p.new { color: green; }</style>
CSS: layout
<div class=”box”>The content within this DIV element will be enclosed in a box with a thin line around it.</div>
div.box { border: solid; border-width: thin; width: 100%; padding: 2em;}
CSS2
free layout can be described for elements
dynamic changes of contents and style, animations etc.
Dynamic HTML
HTML ECMAScript (JavaScript, JScript) CSS DOM
Three-tier architecture
browser web server: processing logic database server
Examples
1. Browser asks for a page. 2. Server sends the page. 3. Browser shows the page.
1. As above, but the page contains a form, which the user fills out. 2. Based on the data of the form, server starts an application which queries a database and forms a new page
Browser vs. server browser interprets CSS-definitions HTML documents may include
embedded JavaScript scripts, which are run in the browser
problems: the implementations of CSS vary, JavaScript may be switched off
most of the functionality on the server side?
XML
Extensible Markup Language (1998) developed for interchanging structured
documents in Internet used more and more as a platform
independent data format between applications
document vs. data
<memo importance=”high” date=”19990323”> <from>Paul V. Biron</from> <to>Ashok Malhotra</to> <subject>Latest draft</subject> <body> We need to discuss the latest draft <emph>immediately</emph>. Either email me at <email> mailto:[email protected]</email> or call <phone>555-9876</phone> </body></memo>
”Document”:
<invoice> <orderDate>19990121</orderDate> <shipDate>19990125</shipDate> <billingAddress> <name>Ashok Malhotra</name> <street>123 IBM Ave.</street> <city>Hawthorne</city> <state>NY</state> <zip>10532-0000</zip> </billingAddress> <voice>555-1234</voice> <fax>555-4321</fax></invoice>
”Data”:
<body> <p><b>Order date:</b> 19990121</p> <p><b>Shipping date:</b> 19990125</p> <p><b>Address:</b></p> <table> <tr><th>name<th>street<th>city<th>state<th>zip <tr><td>Ashok Malhotra <td>123 IBM Ave. <td>Hawthorne <td>NY <td>10532-0000 </table> <p>Phone: 555-1234</p> <p>Fax: 555-4321</p></body>
Basic concepts: logical structure
logical structure: elements names of elements can be chosen
freely elements can have attributes logical structure is described by a
document type definition (DTD)
Elements
elements can be containers, which can contain other elements and/or text, e.g.
<name><fname>Helena</fname> <lname>Ahonen</lname></name>
an element can also be empty: <img src=”picture.jpg” alt=”Picture” />
Attributes
attributes express information that is not really content
attribute/value pairs are attached to the start tag of an element
<memo importance=”high”>…</memo> it may be difficult to decide whether
some information should be modeled as an element or as an attribute
Attribute or element?
<memo date=”060600”> <from>Ashok Malhotra</from> <to>Peter May</to> …</memo>
<memo> <from>Ashok Malhotra</from> <to>Peter May</to> <date>060600</date> ...</memo>
Defining the structure: DTD
document type definition (DTD) describes how the elements are formed
from the other elements and text defines which attributes an element
may/must have
Examples of definitions
<!ELEMENT name (fname+, lname)> <!ELEMENT address (name, street,
(city, state, zipcode) | (zipcode, city))> <!ELEMENT contact
(address, phone*, email?)> <!ELEMENT contact2
(address | phone | email)*>
Symbols
+ : 1 or more * : 0 or more ? : 0 or 1 | : choice (one has to be chosen) () : grouping , : order
DTD for the Invoice example
<!DOCTYPE invoice [<!ELEMENT invoice (orderDate, shipDate, billingAddress voice*, fax?)><!ELEMENT orderDate (#PCDATA)><!ELEMENT shipDate (#PCDATA)><!ELEMENT billingAddress (name, street, city, state, zip)><!ELEMENT voice (#PCDATA)><!ELEMENT fax (#PCDATA)><!ELEMENT name (#PCDATA)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT state (#PCDATA)><!ELEMENT zip (#PCDATA)>]>
Note:
elements cannot overlap container elements must have end tags empty elements: <br /> all names are case-sensitive attribute values must be delimited by
quotation marks
Well-formed XML documents
documents that adhere to the formal requirements (syntax) of the XML specification
if a document is not well-formed, it is not an XML document (and the XML tools do not have to process it)
Valid documents
a document is a valid XML-document, if it is well-formed and adheres to the structure defined in the DTD given
XML-processor can be validating or non-validating
sometimes validity is important, sometimes not
Where do the DTDs come from?
general DTDs: communities that have to be able to interchange information agree on a common DTD
also standard-like: MathML, SMIL tailored DTDs can be designed for the
own use
XML basics: physical structure
physical structure: entities ”file structure”: a document is assembled
from parts: e.g. chapters of a book (each in one file)
including parts that appear often non-XML content: e.g. images characters that are not found in the
keyboard
Entities
In DTD: <!ENTITY HY ”Helsingin yliopisto”>
dokumentin sisällä: <place>&HY;</place>
Defining the presentation
names of elements are arbitrary: the browsers cannot know how an element should be presented
presentation is defined using a separate stylesheet (CSS, XSL)
one stylesheet - many documents one document - many stylesheets
Extensible Style Language (XSL)
specification contains two parts: transformation language XSLT and formatting objects
XSLT-transformation can express many kinds of transformations: elements can be inserted and deleted, elements can be reordered etc.
standardization of formatting objects not ready
Transformation target
XSLT-transformations can be used for transformations into several different representations
since the standardization of general formatting objects is not ready, transforming XML into HTML is a good choice
transformations into other XML-formats, PDF, etc. also possible
<sales> <products><product id=”p1”>Packing Boxes</product> <product id=”p2”>Packing Tape</product> </products> <record><cust num=”C1001”> <prodsale idref=”p1”>100</prodsale> <prodsale idref=”p2”>200</prodsale> </cust> <cust num=”C1002”> <prodsale idref=”p2”>50</prodsale> </cust> <cust num=”C1003”> <prodsale idref=”p1”>75</prodsale> <prodsale idref=”p2”>15</prodsale> </cust> </record></sales>
<body> <h2>Record of Sales</h2>
<ul> <li>C1001 - Packing Boxes - 100</li> <li>C1001 - Packing Tape - 200</li> <li>C1002 - Packing Tape - 50</li> <li>C1003 - Packing Boxes - 75</li> <li>C1003 - Packing Tape - 15</li> </ul></body>
XSLT transformations XML document is seen as a tree how do we get from the source tree to the
target tree? transformation rules are matched to the
parts of the tree, and transformations defined by the rules are applied
tree is often traversed starting from root contents can be picked from any part
<xsl:template match=”/”> <html><head><title>Record of Sales</title></head> <body><h2>Record of Sales</h2> <xsl:apply-templates select=”/sales/record”/> </body></html></xsl:template>
<xsl:template match=”record”> <ul><xsl:apply-templates/></ul></xsl:template>
<xsl:template match=”prodsale”> <li><xsl:value-of select=”../@num”/> <xsl:text> - </xsl:text> <xsl:value-of select=”id(@idref)”/> <xsl:text> - </xsl:text> <xsl:value-of select=”.”/></li></xsl:template></xsl:stylesheet>
Other XML related standards
XHTML Xlink XML Schema DOM RDF
XHTML
Extensible HyperText Markup Language (v. 1.0 January 2000)
redefinition of HTML using XML XHTML documents can be processed
using XML tools
XHTML: modularization
XHTML facilitates creating new document types:
a subset can be used (e.g. for presentation on different devices)
definitions can be expanded (special elements, e.g. for representation of medical information)
XLink
XML Linking Language (July 2000) links can have several targets types, roles, etc. can be attached to link links can be stored separately from the
document link can point to an arbitrary location in the
target document behavior of the link can be defined
DOM
Document Object Model (Sep 2000) defines a platform- and language-
neutral programming interface (API) for HTML ja XML documents
defines how programs and scripts can retrieve, insert, delete, and modify contents, structure and styles
XML Schema
Sep 2000 the modeling power of DTD is restricted datatyping: e.g. date, integer database schema-like representation:
constraints e.g. how many times the element may occur
RDF
Resource Description Framework (Mar 2000)
RDF can be used for describing metadata of web resources
metadata for search engines, for managing large collections, for depicting the parts of a large document etc.
XML vs. HTML
Good in HTML
well-known and broadly used: large public can use easily
browsers know how to show: it is not necessary to define the presentation separately
heterogenous material is simple to combine using hyperlinks
Bad in HTML
contents and presentation intermingle: multiple usages in different contexts is difficult
accessing parts of a document is hard representing complex structures is
difficult automatization is difficult
Good in XML
contents in one place -> several presentations for several media
automatic processing of documents is easier: more precise queries, transformations, retrieving specific data
structure of documents can be validated
Bad in XML
meaning of elements have to be known presentation does not exist
automatically: stylesheets have to be given
creating documents may require using special editors or laborious conversion
browsers do not support well, yet
XML in system architectures
basically like with HTML (three-tier) use of XML is influenced by the nature of
the contents (”data” or ”document”) ”data”: XML as an interchange format
between applications (storage e.g. in relational databases)
”document”: content management systems (often based on object databases)
Browser vs. server
decision: where the final presentation is formed?
If the browser understands XSL, formatting can be given to the browser; otherwise the server transforms the document into HTML with CSS-styles
probably always some transformation from the original XML format
Tools
editors: XML, XSL, DTD, XML Schema parsers (included in many tools) XSL-engines content management systems (e.g.,
managing document components, version managements, assembly)
e-commerce tools
Technology providers
Microsoft, IBM / AlphaWorks publishing technology providers
(Arbortext, SoftQuad, Chrystal Software, Poet)
database technology providers (Oracle, Sybase)
public domain software, prototypes, etc. (e.g. Apache Cocoon -project)
XML portals
www.xml.com www.xml.org www.w3c.org www.oasis-open.org www.xmlsoftware.com www.cs.helsinki.fi/~hahonen/uumek00/
sisalto/xml/ (New Media course)