21
University of Jyväskylä/A Ho & VLy Experiences of Document Transformations with XSLT and DOM Anne Honkaranta, Virpi Lyytikäinen, Pasi Tiitinen, University of Jyväskylä, Finland inSGML project

University of Jyväskylä/AHo & VLy Experiences of Document Transformations with XSLT and DOM Anne Honkaranta, Virpi Lyytikäinen, Pasi Tiitinen, University

Embed Size (px)

Citation preview

University of Jyväskylä/AHo & VLy

Experiences of Document Transformations with XSLT and DOMAnne Honkaranta, Virpi Lyytikäinen, Pasi

Tiitinen, University of Jyväskylä, Finland

inSGML project

University of Jyväskylä/AHo & VLy

Content

Poem Publishers, Inc. Poems Publishing environment

Transformations Tranformation techniques Transformations in server-client environment Tranformations in Poem Publishers, Inc

Challenges encountered Lessons learned

University of Jyväskylä/AHo & VLy

Poem Publishers, Inc. Fictional company Publishes Finnish poems on WWW Poems are authored in XML format

according to a DTD The company offers the poets an

authoring environment if so desired The poems can form collections

University of Jyväskylä/AHo & VLy

Poem.dtd

University of Jyväskylä/AHo & VLy

Publishing environment Microsoft IIS server v. 5.0 Jscript, VBScript ASP 3.0 DOM II Internet Explorer 5.5 or newer CSS Level 2 MSXML 3.0

University of Jyväskylä/AHo & VLy

Transformation Changing/converting document

format structure /information schema content organization filtering the content all the above

Conversion, filtering, and transformation are sometimes used as synonyms

University of Jyväskylä/AHo & VLy

Why you need transformations?

Authors need content-oriented DTD Different end-user devices When managing documents we

need to have them in an optimal format for processing

--> three-step publication process authoring -- processing -- output

University of Jyväskylä/AHo & VLy

Transformation techniques Event-based mapping technique

Tree-based mapping technique

Examples of languages

•SAX-Simple API for XML •Omnimark language/program

•DOM (document object model) — API•Balise language/program •XSLT language

Pros/cons.

•fast, uses computing resources efficiently•does not give very good control over schema (dtd, grammar) of an output document

•constructing a parse tree in memory takes resources •good control over schema of an output documen •best suited for complex (context) transformation)

University of Jyväskylä/AHo & VLy

Transformations in client-server environment (XSLT/DOM)

Alternatives: using PI in XML source document (c) (can be written to the source document on a

web server) DOM-interface and DOM objects for

loading the source XML and XSLT (c/s) using DOM-interface + scripting

language (Vbscript, Jscript) or Java

University of Jyväskylä/AHo & VLy

Transformation chain (an example)

OutputHTML/

XHTML docrenderedby CSS

Outputdoc. +link to

CSS

CSS doc.

SourceXMLdoc

XSLTdoc.

ClientServer/Client

University of Jyväskylä/AHo & VLy

Example:using PI in source XML<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href=”poem_html.xsl" ?><!DOCTYPE POEM SYSTEM "Poem1.dtd">...

<xsl:stylesheet.....<html><head><meta><LINK rel="stylesheet” type="text/css” href="runo_htm.css" ></LINK>

University of Jyväskylä/AHo & VLy

Example: using DOM-objects+XSLT

<HTML><BODY><HEAD></HEAD><SCRIPT LANGUAGE=VBSCRIPT>Dim objDocument, objXSL, strXML

Set objDoc = CreateObject("MSXML2.DOMDocument")Set objXSL = CreateObject ("MSXML2.DOMDocument")

objDoc.async=falseobjXSL.async=false

objDoc.Load "../Runot/Pinkku1.xml"objXSL.Load "runo1_htmlksi2.xsl"

strXML=objDoc.transformNode(objXSL)Document.Write strXML</SCRIPT></BODY></HTML>

University of Jyväskylä/AHo & VLy

Example: using Vbscript+DOM

<HTML><HEAD><TITLE>Inspect nodes of poem</TITLE></HEAD><BODY><SCRIPT LANGUAGE="VBSCRIPT" CODEPAGE="iso-8859-1" LCID="1033">Dim root, xmlDoc, child Set xmlDoc = CreateObject("Msxml2.DOMDocument")xmlDoc.async = FalsexmlDoc.load("Runot/Pinkku1.xml")

'Walk from the document to each of its child nodes:For Each child In xmlDoc.childNodes

document.write ”type of node:" & child.nodeType & " | " document.write ”name of node:" & child.nodeName & " | " document.write ”content of node:" & child.text & "<BR>"Next</SCRIPT></BODY></HTML>

University of Jyväskylä/AHo & VLy

Transformation ”types” tested in Poem Publishers, Inc. XML-to-XML XML-to-HTML XML-to-XHTML

University of Jyväskylä/AHo & VLy

Transformation needs tested in Poem Publishers, Inc. Tasks tested:

combining multiple source documents into output view (poem+header/footer, poem list, poem metadata)

combining multiple source documents into one file (making a poem collection)

combining XSLT transformation documents for transformation needs (poem+footer)

University of Jyväskylä/AHo & VLy

Example: combining XSLT-stylesheets

<?xml version=”1.0” encoding=”iso-8859-1”?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1999/REC-html401">

<xsl:import href="header.xsl"/><xsl:output method="html" encoding="ISO-8859-1" />

<?xml version=”1.0” encoding=”iso-8859-1”?>!-- Filename: header.xsl --><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1999/REC-html401"><xsl:output method="html" encoding="ISO-8859-1" /><xsl:template match="/" name="header">

University of Jyväskylä/AHo & VLy

Challenges Encountered Problems with

parsers and versions character encodings figures and links ”too many” tools, scripting

languages, and programs

University of Jyväskylä/AHo & VLy

Example: Character encodings and parser MSXML

INPUTDOC

MSXML3.0

OUTPUTDOC

-input doc encoding-maybe character entities-entities are changed to actual character reps.when transformed

-uses UTF-16-detects output encoding from PI when appropriate load/save methods used-otherwise outputs UTF-16

-has some encoding-has an encoding declaration-problem:either of them is ”wrong”

University of Jyväskylä/AHo & VLy

Possibilities you can use XSLT-stylesheets as

components and combine them a stylesheet can be seen as a re-usable

component on the server you can also chain transformations you can keep your data in content-

oriented form and provide multiple output versions by using transformations

problem: management of DTD’s, transformation components and versions

University of Jyväskylä/AHo & VLy

Lessons learned

Use same character encodings in source documents and transformation scripts

Offer a content oriented DTD for your authors; there is propably need for transformations anyway

Support level of CSS, XSLT and XML varies in browsers

Tools are available for building XML publishing environments: allow extra time for dealing with possible problems

Multiple skills and tools needed in publishing environment, XML is not enough!

University of Jyväskylä/AHo & VLy

More information: inSGML project http://haades.it.jyu.fi/inSGML/