37
1 Tools for Memory: Semantic Content (XML) Mahesh Chaudhari School of Computing and Informatics Department of Computer Science and Engineering Arizona State University

1 Tools for Memory: Semantic Content (XML) Mahesh Chaudhari School of Computing and Informatics Department of Computer Science and Engineering Arizona

Embed Size (px)

Citation preview

1

Tools for Memory: Semantic Content (XML)

Mahesh ChaudhariSchool of Computing and InformaticsDepartment of Computer Science and EngineeringArizona State University

2

Outline

World Wide Web (WWW) and HTMLMigration towards XMLWhat is XML?XML supporting TechnologiesLanguages based on XML specificationsConclusion

3

World Wide Web (WWW) and HTMLGiant network of computers.Part of day to day activities.Emails, chat, video, news.Most important Browsing or

surfing the Net.

HTML (Hyper-Text Markup Language) common language for Internet.

4

Why HTML?

Easy to understand, learn and use. Quick and fancy way of presentation. Fixed set of instructions in the form of elements

(tags) and attributes. e.g. <HTML>, <HEAD>, <BODY>, etc.

Standard for sharing information over Internet. Understandable by all the Internet browsers.

Text Based browsers e.g. Lynx, HyperTerminal, etc. Graphical browsers e.g. IE, Firefox, Netscape

Navigator, etc.

5

Structure of HTML Document.

<HTML><HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=Windows-1252">

<TITLE>course</TITLE></HEAD><BODY bgcolor="#9F9F9F">

<Center>course</Center><TABLE>

<TR><TD>123</TD><TD>databases</TD><TD>S. Urban</TD><TD>3</TD>

</TR></TABLE>

</BODY></HTML>

Root of the Document

Cover Page of the Document

Main Content of the Document

Draws a Table with course Information in rows and columns.

Hea

der

Con

tent

s

6

Tree Structure for HTML

7

Why not HTML?

Fixed set of elements vs. User-defined

HTML: <TD>S. Urban</TD>

XML: <instructor>S. Urban</instructor>Similarly with the attributes.Cannot exchange information between

different applications, organizations, etc.Cannot provide more meaning to the data

(semantics to the data).

8

Internet/NetworkInternet/Network

Hospital

School

Company

Personal Records

Student/Person

UniversitySchool Records

University Records

Health Records

Employment Records

Virtual Organization

•Sharing Information•Understanding what is being sent/received

9

Can XML be THE Solution?

10

eXtensible Markup Language (XML)Similar to HTML (consists of elements,

attributes and DATA).Allows definition of user-defined elements

and attributes (<instructor> tag is allowed).More meaning to the data (adds semantics to

the data).Extensively used for data exchange.Understood by most of the Internet browsers.

More Strict, Powerful and Rich than HTML.

11

Structure of XML Document

<?xml version="1.0" encoding="UTF-8"?><dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation="course.xsd"><course>

<crsid>123</crsid><cname>databases</cname><inst>S. Urban</inst><length>3</length>

</course></course>

<crsid>124</crsid><cname>software engineering</cname><inst>J. Urban</inst><length>3</length>

</course></dataroot>

Root

Main Content of the Document

User-defined elements give more meaning to the data

12

Is XML Strict?

<HTML><HEAD>

<TITLE>course</TITLE></HEAD><BODY bgcolor="#9F9F9F">

<Center>course</Center><TABLE border="1">

<TR><TD>123<TD>databases</TD><TD>S. Urban</TD><TD>3</TD>

</TABLE></BODY>

</HTML>

Allowed in HTML

<?xml version="1.0" encoding="UTF-8"?>

<dataroot>

<course>

<crsid>123</crsid>

<cname>databases</cname>

<inst>S. Urban</inst>

<length>3</length>

</course>

</dataroot>

For every starting elementXML should always have ending element !

HTML XML

</TD>?

</TR>?

<?xml version="1.0" encoding="UTF-8"?>

<dataroot>

<course>

<crsid>123

<cname>databases</cname>

<inst>S. Urban</inst>

<length>3</length>

</course>

</dataroot>

Not allowed in XML

Allowed in XML

13

<?xml version="1.0" encoding="UTF-8"?>

<dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<b><i>This text is bold and italics. but this text is only italics. </i></b>

</dataroot>

Is XML more strict than that?

<?xml version="1.0" encoding="UTF-8"?>

<dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<b><i>This text is bold and italics.</b> but this text is only italics.</i>

</dataroot>

All the elements in XML document should be properly nested !

<HTML><HEAD>

<TITLE>course</TITLE></HEAD><BODY bgcolor="#9F9F9F">

<b><i>This text is bold and italics.</b> but this text is only italics.</i>

</BODY></HTML>

Allowed in HTML

HTML XML

Not allowed in XML

Allowed in XML

14

Key Features of XML

User-defined tags/elements possible. Document has only one root element. Document must be well-formed.

Every start tag should have end tag. Tags must be properly nested.

Tags in XML are case sensitive and may not contain white space.

Tags must start with a letter or underscore, and may contain letters, digits, period ( . ), underscore( _ ) or hyphen ( - )

Tags cannot begin with the letters "xml" - reserved Tags should have semantic meaning. Start tags may have attributes.

15

Elements

Elements Always consist of start_tag, data (optional), and end_tag. E.g. <crsid>123</crsid>

<hr></hr> or <hr/>

Attributes Provide metadata information or additional information for

the element and occur only once inside the element. E.g. <course ID=“123”></course>

16

Special Attributes in XML

ID and IDREF ID: unique value in the whole document. IDREF: reference the unique ID values in the document.

e.g.

<instructor ID=“1”>S. Urban</instructor>

<instructor ID=“2”>P. Dasgupta</instructor>

<course>

<inst IDREF=“1” />

</course>

17

Data-centric XML

Regular, defined structure.Ordering of tags immaterial.Used for machine reading.E.g. Course information or Instructor

Information.

18

Data-centric XML E.g. Course Information<?xml version="1.0" encoding="UTF-8"?><dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation="course.xsd"><course>

<crsid>123</crsid><cname>databases</cname><inst>S. Urban</inst><length>3</length>

</course></course>

<crsid>124</crsid><cname>software engineering</cname><inst>J. Urban</inst><length>3</length>

</course></dataroot>

19

Document-centric XML

Less regular structure.Ordering of tags important.Mostly used for human consumption.E.g. Product description, Book Information,

Library Catalogs.

20

<Product> <Intro> The <ProductName>Turkey Wrench</ProductName> from <Developer>Full Fabrication Labs, Inc.</Developer> is <Summary>like a monkey wrench, but not as big.</Summary> </Intro> <Description>

<Para>The turkey wrench, which comes in <i>both right- and left- handed versions (skyhook optional)</i>, is made of the <b>finest stainless steel</b>. The Readi-grip rubberized handle quickly adapts to your hands, even in the greasiest situations. Adjustment is possible through a variety of custom dials.</Para> <Para>You can:</Para> <List>

<Item><Link URL="Order.html">Order your own turkey wrench</Link></Item> <Item><Link URL="Wrenches.htm">Read more about wrenches</Link></Item> <Item><Link URL="Catalog.zip">Download the catalog</Link></Item>

</List> <Para>The turkey wrench costs <b>just $19.99</b> and, if you order now, comes with a <b>hand-crafted shrimp hammer</b> as a bonus gift.</Para>

</Description></Product>

Document-centric XML E.g. Product Description

21

XML a Jigsaw Puzzle

Supporting Technologies Give meaning to

the elements. Data types for

every element. Traversing,

querying mechanism.

Support in other programming languages.

Presentation like HTML.

XML

XSDDTD

DOM

CSS

XSLT

XPATHSAX XQUERY

22

Meaning and Structure to XML

Document Type Definition (DTD) Describes the structure of the XML document. Legal Parents – Children relationships. Custom non-XML syntax to describe the

schema. Does not support data types and namespaces.

XML Schema Definition (XSD) XML-like syntax to describe the schema. Supports different data types. Support namespaces.

23

Traversing and Querying

XPath Navigate through the XML document. Find a particular element or attribute. Building block for XQuery, XSLT.

XQuery Find and retrieve elements and attributes from

XML document. Query language similar to SQL. Supported by relational databases like Oracle

and SQL server.

24

Other Programming Languages Support Document Object Model (DOM)

Standard way of accessing and manipulating XML elements.

Loads the entire XML document in the memory (RAM). Bi-directional traversal of the XML tree. Slow and high memory consumption for large XML

documents. Simple API for XML (SAX)

Event-driven parser to access XML elements. Reads the XML document from the file, element-by-

element basis. Unidirectional traversal of the XML tree (top to bottom). Fast and low memory consumption for large XML

documents.

25

Presenting XML Data

Cascading Style Sheet (CSS) Set of instructions to present data in readable format. Non-XML syntax to make data look pretty. Used in conjunction with HTML.

EXtensible Stylesheet Language Transformations (XSLT) Transforms XML document into HTML, another XML or

text file. Uses XPath extensively. XML-like syntax.

26

Other Markup Languages

MathML : Markup language for MathematicsSVG : Scalar Vector GraphicsMusicXML: an XML-based music notation file

format.VoiceXML: format for specifying interactive

voice dialogues between a human and a computer

Linguists : Use of XML in studying different languages and their grammar.

http://en.wikipedia.org/wiki/List_of_XML_markup_languages

27

Useful Links http://xml.coverpages.org/PESC-HS-Transcript2006.html XML High School Transcript Standard

http://enterprise.astm.org/REDLINE_PAGES/E2369.htm XML Health care Record Standard

http://www.w3schools.com/xml/default.asp XML Tutorial from W3Cschools

http://www.w3schools.com/xsl/xsl_languages.asp XSL Tutorial from W3Cschools

http://www.w3schools.com/xpath/default.asp XPath Tutorial from W3Cschools

http://www.w3schools.com/xquery/default.asp XQuery Tutorial from W3Cschools

http://www.w3schools.com/dtd/default.asp DTD Tutorial from W3Cschools

http://www.w3schools.com/schema/default.asp XML Schema (XSD) Tutorial from W3Cschools

http://www.xml.com/pub/rg/XML_Editors XML Editors (contains a list of editors, not exhaustive many more exist

outthere)

http://www.w3.org/ The World Wide Web Consortium (W3C)

http://www.wowwiki.com/XML_User_Interface World of Warcraft and XML

http://docs.info.apple.com/article.html?artnum=93732 iTunes and XML

28

Internet/NetworkInternet/Network

Hospital

School

Company

Personal Records

Student/Person

UniversitySchool Records(XML)

University Records(XML)

Health Records(XML)

Employment Records

RevisitVirtual Organization

•Sharing Information•Understanding what is being sent/received

29

Summary

HTML and WWW Limitations of HTML

Introduction to XMLXML

Structure Key features Data-centric Vs. Document-centric. Supporting technologies

30

31

Document Type Definition (DTD)

The following slides are derived from the slides of Dr. Suzanne Dietrich.

She is an assistant professor at the West campus, Department of Mathematical Sciences & Applied

Computing.

32

Document Type Definition (DTD) Describes the structure of the XML document. Legal Parents – Children relationships. Can be defined as internal section of the XML

document before the root element of the XML document.

<!DOCTYPE root-element [element-declarations]> Can be attached to XML document as an external

reference.

<!DOCTYPE root-element SYSTEM "filename.dtd">

33

Structure of DTD Document

<!ELEMENT elementName contentSpecification> contentSpecification defines the content of the element

ANY: No restrictions on the element’s content; limited use EMPTY: Cannot store any content (assume attributes) #PCDATA: Contains parsed character data (NO ELEMENTS)

&lt; (<) &gt;(<) &quot;(") &apos; (‘) &amp; (&)<!ELEMENT inst (#PCDATA)>

Nested elements using parentheses Mixed elements – can contain parsed character data and nested

elements

34

DTD: Nested Elements

(element1, element2, element3) indicates a sequence of elements, i.e., ordered

<!ELEMENT sequencedElements (element1, element2, element3)> <!ELEMENT course (crsid, cname, inst, length)>

(elementA | elementB | elementC) indicates a choice of elements

<!ELEMENT choiceOfElements (elementA | elementB | elementC)> <!ELEMENT customer (name | company)>

35

DTD: Elements Cardinality

element+: element occurs one or more times

element*: element occurs zero or more times

element?: optional (0 or 1) element: exactly once

36

DTD: Mixed Elements

<!ELEMENT elementName (#PCDATA | child1 | child2 | …) * >

Elements with mixed content allow for both parsed character data or child elements.

Allows any number of occurrences of pcdata or child elements

Not very useful for a document with defined structure.

37

Limitations of DTD

No support for newer features of XML — most importantly, namespaces.

Lack of expressivity. Certain formal aspects of an XML document cannot be captured in a DTD.

Custom non-XML syntax to describe the schema.