Upload
godfrey-ball
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
1
Tools for Memory: Semantic Content (XML)
Mahesh ChaudhariSchool of Computing and InformaticsDepartment of Computer Science and EngineeringArizona State University
2
Outline
World Wide Web (WWW) and HTMLMigration towards XMLWhat is XML?XML supporting TechnologiesLanguages based on XML specificationsConclusion
3
World Wide Web (WWW) and HTMLGiant network of computers.Part of day to day activities.Emails, chat, video, news.Most important Browsing or
surfing the Net.
HTML (Hyper-Text Markup Language) common language for Internet.
4
Why HTML?
Easy to understand, learn and use. Quick and fancy way of presentation. Fixed set of instructions in the form of elements
(tags) and attributes. e.g. <HTML>, <HEAD>, <BODY>, etc.
Standard for sharing information over Internet. Understandable by all the Internet browsers.
Text Based browsers e.g. Lynx, HyperTerminal, etc. Graphical browsers e.g. IE, Firefox, Netscape
Navigator, etc.
5
Structure of HTML Document.
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=Windows-1252">
<TITLE>course</TITLE></HEAD><BODY bgcolor="#9F9F9F">
<Center>course</Center><TABLE>
<TR><TD>123</TD><TD>databases</TD><TD>S. Urban</TD><TD>3</TD>
</TR></TABLE>
</BODY></HTML>
Root of the Document
Cover Page of the Document
Main Content of the Document
Draws a Table with course Information in rows and columns.
Hea
der
Con
tent
s
7
Why not HTML?
Fixed set of elements vs. User-defined
HTML: <TD>S. Urban</TD>
XML: <instructor>S. Urban</instructor>Similarly with the attributes.Cannot exchange information between
different applications, organizations, etc.Cannot provide more meaning to the data
(semantics to the data).
8
Internet/NetworkInternet/Network
Hospital
School
Company
Personal Records
Student/Person
UniversitySchool Records
University Records
Health Records
Employment Records
Virtual Organization
•Sharing Information•Understanding what is being sent/received
10
eXtensible Markup Language (XML)Similar to HTML (consists of elements,
attributes and DATA).Allows definition of user-defined elements
and attributes (<instructor> tag is allowed).More meaning to the data (adds semantics to
the data).Extensively used for data exchange.Understood by most of the Internet browsers.
More Strict, Powerful and Rich than HTML.
11
Structure of XML Document
<?xml version="1.0" encoding="UTF-8"?><dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="course.xsd"><course>
<crsid>123</crsid><cname>databases</cname><inst>S. Urban</inst><length>3</length>
</course></course>
<crsid>124</crsid><cname>software engineering</cname><inst>J. Urban</inst><length>3</length>
</course></dataroot>
Root
Main Content of the Document
User-defined elements give more meaning to the data
12
Is XML Strict?
<HTML><HEAD>
<TITLE>course</TITLE></HEAD><BODY bgcolor="#9F9F9F">
<Center>course</Center><TABLE border="1">
<TR><TD>123<TD>databases</TD><TD>S. Urban</TD><TD>3</TD>
</TABLE></BODY>
</HTML>
Allowed in HTML
<?xml version="1.0" encoding="UTF-8"?>
<dataroot>
<course>
<crsid>123</crsid>
<cname>databases</cname>
<inst>S. Urban</inst>
<length>3</length>
</course>
</dataroot>
For every starting elementXML should always have ending element !
HTML XML
</TD>?
</TR>?
<?xml version="1.0" encoding="UTF-8"?>
<dataroot>
<course>
<crsid>123
<cname>databases</cname>
<inst>S. Urban</inst>
<length>3</length>
</course>
</dataroot>
Not allowed in XML
Allowed in XML
13
<?xml version="1.0" encoding="UTF-8"?>
<dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<b><i>This text is bold and italics. but this text is only italics. </i></b>
</dataroot>
Is XML more strict than that?
<?xml version="1.0" encoding="UTF-8"?>
<dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<b><i>This text is bold and italics.</b> but this text is only italics.</i>
</dataroot>
All the elements in XML document should be properly nested !
<HTML><HEAD>
<TITLE>course</TITLE></HEAD><BODY bgcolor="#9F9F9F">
<b><i>This text is bold and italics.</b> but this text is only italics.</i>
</BODY></HTML>
Allowed in HTML
HTML XML
Not allowed in XML
Allowed in XML
14
Key Features of XML
User-defined tags/elements possible. Document has only one root element. Document must be well-formed.
Every start tag should have end tag. Tags must be properly nested.
Tags in XML are case sensitive and may not contain white space.
Tags must start with a letter or underscore, and may contain letters, digits, period ( . ), underscore( _ ) or hyphen ( - )
Tags cannot begin with the letters "xml" - reserved Tags should have semantic meaning. Start tags may have attributes.
15
Elements
Elements Always consist of start_tag, data (optional), and end_tag. E.g. <crsid>123</crsid>
<hr></hr> or <hr/>
Attributes Provide metadata information or additional information for
the element and occur only once inside the element. E.g. <course ID=“123”></course>
16
Special Attributes in XML
ID and IDREF ID: unique value in the whole document. IDREF: reference the unique ID values in the document.
e.g.
<instructor ID=“1”>S. Urban</instructor>
<instructor ID=“2”>P. Dasgupta</instructor>
…
<course>
<inst IDREF=“1” />
…
</course>
17
Data-centric XML
Regular, defined structure.Ordering of tags immaterial.Used for machine reading.E.g. Course information or Instructor
Information.
18
Data-centric XML E.g. Course Information<?xml version="1.0" encoding="UTF-8"?><dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="course.xsd"><course>
<crsid>123</crsid><cname>databases</cname><inst>S. Urban</inst><length>3</length>
</course></course>
<crsid>124</crsid><cname>software engineering</cname><inst>J. Urban</inst><length>3</length>
</course></dataroot>
19
Document-centric XML
Less regular structure.Ordering of tags important.Mostly used for human consumption.E.g. Product description, Book Information,
Library Catalogs.
20
<Product> <Intro> The <ProductName>Turkey Wrench</ProductName> from <Developer>Full Fabrication Labs, Inc.</Developer> is <Summary>like a monkey wrench, but not as big.</Summary> </Intro> <Description>
<Para>The turkey wrench, which comes in <i>both right- and left- handed versions (skyhook optional)</i>, is made of the <b>finest stainless steel</b>. The Readi-grip rubberized handle quickly adapts to your hands, even in the greasiest situations. Adjustment is possible through a variety of custom dials.</Para> <Para>You can:</Para> <List>
<Item><Link URL="Order.html">Order your own turkey wrench</Link></Item> <Item><Link URL="Wrenches.htm">Read more about wrenches</Link></Item> <Item><Link URL="Catalog.zip">Download the catalog</Link></Item>
</List> <Para>The turkey wrench costs <b>just $19.99</b> and, if you order now, comes with a <b>hand-crafted shrimp hammer</b> as a bonus gift.</Para>
</Description></Product>
Document-centric XML E.g. Product Description
21
XML a Jigsaw Puzzle
Supporting Technologies Give meaning to
the elements. Data types for
every element. Traversing,
querying mechanism.
Support in other programming languages.
Presentation like HTML.
XML
XSDDTD
DOM
CSS
XSLT
XPATHSAX XQUERY
22
Meaning and Structure to XML
Document Type Definition (DTD) Describes the structure of the XML document. Legal Parents – Children relationships. Custom non-XML syntax to describe the
schema. Does not support data types and namespaces.
XML Schema Definition (XSD) XML-like syntax to describe the schema. Supports different data types. Support namespaces.
23
Traversing and Querying
XPath Navigate through the XML document. Find a particular element or attribute. Building block for XQuery, XSLT.
XQuery Find and retrieve elements and attributes from
XML document. Query language similar to SQL. Supported by relational databases like Oracle
and SQL server.
24
Other Programming Languages Support Document Object Model (DOM)
Standard way of accessing and manipulating XML elements.
Loads the entire XML document in the memory (RAM). Bi-directional traversal of the XML tree. Slow and high memory consumption for large XML
documents. Simple API for XML (SAX)
Event-driven parser to access XML elements. Reads the XML document from the file, element-by-
element basis. Unidirectional traversal of the XML tree (top to bottom). Fast and low memory consumption for large XML
documents.
25
Presenting XML Data
Cascading Style Sheet (CSS) Set of instructions to present data in readable format. Non-XML syntax to make data look pretty. Used in conjunction with HTML.
EXtensible Stylesheet Language Transformations (XSLT) Transforms XML document into HTML, another XML or
text file. Uses XPath extensively. XML-like syntax.
26
Other Markup Languages
MathML : Markup language for MathematicsSVG : Scalar Vector GraphicsMusicXML: an XML-based music notation file
format.VoiceXML: format for specifying interactive
voice dialogues between a human and a computer
Linguists : Use of XML in studying different languages and their grammar.
http://en.wikipedia.org/wiki/List_of_XML_markup_languages
27
Useful Links http://xml.coverpages.org/PESC-HS-Transcript2006.html XML High School Transcript Standard
http://enterprise.astm.org/REDLINE_PAGES/E2369.htm XML Health care Record Standard
http://www.w3schools.com/xml/default.asp XML Tutorial from W3Cschools
http://www.w3schools.com/xsl/xsl_languages.asp XSL Tutorial from W3Cschools
http://www.w3schools.com/xpath/default.asp XPath Tutorial from W3Cschools
http://www.w3schools.com/xquery/default.asp XQuery Tutorial from W3Cschools
http://www.w3schools.com/dtd/default.asp DTD Tutorial from W3Cschools
http://www.w3schools.com/schema/default.asp XML Schema (XSD) Tutorial from W3Cschools
http://www.xml.com/pub/rg/XML_Editors XML Editors (contains a list of editors, not exhaustive many more exist
outthere)
http://www.w3.org/ The World Wide Web Consortium (W3C)
http://www.wowwiki.com/XML_User_Interface World of Warcraft and XML
http://docs.info.apple.com/article.html?artnum=93732 iTunes and XML
28
Internet/NetworkInternet/Network
Hospital
School
Company
Personal Records
Student/Person
UniversitySchool Records(XML)
University Records(XML)
Health Records(XML)
Employment Records
RevisitVirtual Organization
•Sharing Information•Understanding what is being sent/received
29
Summary
HTML and WWW Limitations of HTML
Introduction to XMLXML
Structure Key features Data-centric Vs. Document-centric. Supporting technologies
31
Document Type Definition (DTD)
The following slides are derived from the slides of Dr. Suzanne Dietrich.
She is an assistant professor at the West campus, Department of Mathematical Sciences & Applied
Computing.
32
Document Type Definition (DTD) Describes the structure of the XML document. Legal Parents – Children relationships. Can be defined as internal section of the XML
document before the root element of the XML document.
<!DOCTYPE root-element [element-declarations]> Can be attached to XML document as an external
reference.
<!DOCTYPE root-element SYSTEM "filename.dtd">
33
Structure of DTD Document
<!ELEMENT elementName contentSpecification> contentSpecification defines the content of the element
ANY: No restrictions on the element’s content; limited use EMPTY: Cannot store any content (assume attributes) #PCDATA: Contains parsed character data (NO ELEMENTS)
< (<) >(<) "(") ' (‘) & (&)<!ELEMENT inst (#PCDATA)>
Nested elements using parentheses Mixed elements – can contain parsed character data and nested
elements
34
DTD: Nested Elements
(element1, element2, element3) indicates a sequence of elements, i.e., ordered
<!ELEMENT sequencedElements (element1, element2, element3)> <!ELEMENT course (crsid, cname, inst, length)>
(elementA | elementB | elementC) indicates a choice of elements
<!ELEMENT choiceOfElements (elementA | elementB | elementC)> <!ELEMENT customer (name | company)>
35
DTD: Elements Cardinality
element+: element occurs one or more times
element*: element occurs zero or more times
element?: optional (0 or 1) element: exactly once
36
DTD: Mixed Elements
<!ELEMENT elementName (#PCDATA | child1 | child2 | …) * >
Elements with mixed content allow for both parsed character data or child elements.
Allows any number of occurrences of pcdata or child elements
Not very useful for a document with defined structure.